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Abstract. We consider two-player games played on weighted directed graphs with mean-payoff and total-payoff 
objectives, which are two classical quantitative objectives. While for single-dimensional objectives all results for 
mean-payoff and total-payoff coincide, we show that in contrast to multi-dimensional mean-payoff games that are 
known to be coNP-complete, multi-dimensional total-payoff games are undecidable. We introduce conservative 
approximations of these objectives, where the payoff is considered over a local finite window sliding along a play, 
instead of the whole play. For single dimension, we show that (£) if the window size is polynomial, then the problem 
can be solved in polynomial time, and (if) the existence of a bounded window can be decided in NP n coNP, and 
is at least as hard as solving mean-payoff games. For multiple dimensions, we show that (£) the problem with fixed 
window size is EXPTIME-complete, and (if) there is no primitive-recursive algorithm to decide the existence of a 
bounded window. 

1 Introduction 

Mean-payoff and total-payoff games. Two-player mean-payoff and total -payoff games are played on finite weighted 
directed graphs (in which every edge has an integer weight) with two types of vertices: in player- 1 vertices, player 1 
chooses the successor vertex from the set of outgoing edges; in player-2 vertices, player 2 does likewise. The game 
results in an infinite path (called a play) through the graph. The mean-payoff (resp. total-payoff) value of a play is the 
long-run average (resp. sum) of the edge-weights along the path. 

Decision problems. The decision problem for mean-payoff and total-payoff games asks, given a starting vertex, 
whether player 1 has a strategy that against all strategies of the opponent ensures a play with value at least 0. Both for 
mean-payoff and total-payoff games, memoryless winning strategies exist for both players (where a memory less strat- 
egy is independent of the past and depends only on the current state) [8,13]. This ensures that the decision problems 
belong to NP n coNP; and they belong to the intriguing class of problems that are in NP n coNP but whether they are 
in P (deterministic polynomial time) are long-standing open questions. The study of mean-payoff games has also been 
extended to multiple dimensions where the problem is shown to be coNP-complete [27,4]. While for one dimension all 
the results for mean-payoff and total-payoff coincide, our first contribution shows that quite unexpectedly (in contrast 
to multi-dimensional mean-payoff games) the multi-dimensional total-payoff games are undecidable. 

Window objectives. On the one hand, the complexity of single-dimensional mean-payoff and total-payoff games are 
long-standing open problems, and on the other hand, the multi-dimensional problem is undecidable for total-payoff 
games. In this work, we propose to study variants of these objectives, namely, bounded window mean-payoff and fixed 
window mean-payoff objectives. In a bounded window mean-payoff objective instead of the long-run average along 
the whole play we consider payoffs over a local bounded window sliding along a play, and the objective is that the 
average weight must be at least zero over every bounded window from some point on. This objective can be seen as a 
strengthening of the mean-payoff objective (resp. of the total-payoff objective if we require that the window objective 
is satisfied from the beginning of the play rather than from some point on), i.e., winning for the bounded window 
mean-payoff objective implies winning for the mean-payoff objective. In the fixed window mean-payoff objective the 
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Table 1: Complexity of deciding the winner and memory required, with \S\ the number of states of the game (vertices 
in the graph), V the length of the binary encoding of weights, and / max the window size. New results in bold (h. for 
hard and c. for complete). 

window length is fixed and given as a parameter. Observe that winning for the fixed window objective implies winning 
for the bounded window objective. 

Our contributions. The main contributions of this work (along with the undecidability of multi-dimensional total- 
payoff games) are as follows: 

1 . Single dimension. For the single-dimensional case we present an algorithm for the fixed window problem that is 
polynomial in the size of the game graph times the length of the binary encoding of weights times the size of 
the fixed window. Thus if the window size is polynomial, we have a polynomial-time algorithm. For the bounded 
window problem we show that the decision problem is in NP n coNP, and at least as hard as solving mean-payoff 
games. However, winning for mean-payoff games does not imply winning for the bounded window mean-payoff 
objective, i.e., the winning sets for mean-payoff games and bounded window mean-payoff games do not coincide. 
Moreover, the structure of winning strategies is also very different, e.g., in mean-payoff games both players have 
memoryless winning strategies, but in bounded window mean-payoff games we show that player 2 requires infinite 
memory. We also show that if player 1 wins the bounded window mean-payoff objective, then a window of size 
(\S\ — 1) • (\S\ ■ W+ 1) is sufficient where S is the state space (the set of vertices of the graph), and W is the largest 
absolute weight value. Finally, we show that (i) a winning strategy for the bounded window mean-payoff objective 
ensures that the mean-payoff is at least regardless of the strategy of the opponent, and (if) a strategy that ensures 
that the mean-payoff is strictly greater than is winning for the bounded window mean-payoff objective. 

2. Multiple dimensions. For multiple dimensions, we show that the fixed window problem is EXPTIME-complete 
(both for arbitrary dimensions with weights in { — 1,0, 1} and for two dimensions with arbitrary weights); and if 
the window size is polynomial, then the problem is PSPACE-hard. For the bounded window problem we show that 
the problem is non-primitive recursive hard (i.e., there is no primitive recursive algorithm to decide the problem). 

3. Memory requirements. For all the problems for which we prove decidability we also characterize the memory 
required by winning strategies. 

The relevant results are summarized in Table 1 : our results are in bold fonts. In summary, the fixed window problem 
provides an attractive approximation of the mean-payoff and total-payoff games that we show have better algorithmic 
complexity. In contrast to the long-standing open problem of mean-payoff games, the one-dimension fixed window 
problem with polynomial window size can be solved in polynomial time; and in contrast to the undecidability of 
multi-dimensional total-payoff games, the multi-dimension fixed window problem is EXPTIME-complete. 

Related works. Mean-payoff games have been first studied by Ehrenfeucht and Mycielski in [8] where it is shown that 
memoryless winning strategies exist for both players. This entails that the decision problem lies in NP n coNP [18,28], 
and it was later shown to belong to UP n coUP [16]. Despite many efforts [14,28,24,20,15], no polynomial-time 
algorithm for the mean-payoff games problem is known so far. Gurvich, Karzanov, Khachivan and Lebedev [14,18] 
provided the first (exponential) algorithm for mean-payoff games, later extended by Pisaruk [24]. The first pseudo- 
polynomial-time algorithm for mean-payoff games was given in [28] and was improved in [2] . Lifshits and Pavlov [20] 
propose an algorithm which is polynomial in the encoding of weights but exponential in the number of vertices of the 
graph: it is based on a graph decomposition procedure. Bjorklund and Vorobyov [15] present a randomized algorithm 
which is both subexponential and pseudo-polynomial. While all the above works are for single dimension, multi- 
dimensional mean-payoff games have been studied in [27,4,6]. One-dimension total -payoff games have been studied 
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in [12] where it is shown that memoryless winning strategies exist for both players and the decision problem is in 
UPHcoUP. 

2 Preliminaries 

We consider two-player turn-based games and denote the two players by V\ and V%. 

Multi-weighted two-player game structures. Multi-weighted two-player game structures are weighted graphs G = 
(Si,S2,E,k,w) where (;') Si and S2 resp. denote the finite sets of vertices, called states, belonging to Vi and V2, with 
Si Pi 52 = and S = S\ US2', (H) E C S x S is the set of edges such that for all s G S, there exists s 1 E S with (s,s') G E; 
(Hi) k G N is the dimension of the weight vectors; and (iv) w: E — »■ Z* is the multi-weight labeling function. The 
game structure G is one-player if S2 = 0- We denote by W the largest absolute weight that appears in the game. For 
complexity issues, we assume that weights are encoded in binary. Hence we differentiate between pseudo-polynomial 
algorithms (poly, in W) and truly polynomial algorithms (poly, in V — |~log 2 W] , the number of bits needed to encode 
the weights). 

A play in G from an initial state s mlt G S is an infinite sequence of states % = s^s\S2 ■ ■ ■ such that so = ^init and 
(sj,jj+i) G E for all i > 0. The prefix up to the n-th state of n is the finite sequence n{n) = sqS\ ...s„. Let Last(7r(«)) = 
s„ denote the last state of n(n). A prefix n(n) belongs to Vi, i G {1,2}, if Last(^(n)) G 5,-. The set of plays of G is 
denoted by Plays(G) and the corresponding set of prefixes is denoted by Prefs(G). The set of prefixes that belong to 
Vi is denoted by Prefs,(G). The infinite suffix of a play starting in s„ is denoted n{n,°°). 

The total-payoff of a prefix p =soSi ...s„ is TP(p) = E)=o _1 w ( s i, s i+\)i and its mean-payoff is MP(p) = ^TP(p). 
This is naturally extended to plays by considering the componentwise limit behavior (i.e., limit taken on each di- 
mension). The infimum (resp. supremum) total-payoff of a play % is TP(7r) = liminf„^ooTP(^(n)) (resp. TP(^) = 
limsup n ^ 00 TP(7r(n))). The infimum (resp. supremum) mean-payoff of % is MP (?r) = liminf,,-^ MP(n(n)) (resp. 
MP(n) = Umsup^ MP(n(n))). 

Strategies. A strategy for Vi, i G {1,2}, in G is a function A,-: Prefs,(G) — > S such that (Last(p), A,(p)) G E for 
all p G Prefs,(G). A strategy A, for Vi has finite-memory if it can be encoded by a deterministic Moore machine 
(M,mo,a u ,a n ) where M is a finite set of states (the memory of the strategy), mo G M is the initial memory state, 
a u : M x 5 — > M is an update function, and a n : M x S, ■ — > S is the next-action function. If the game is in s G 5, and 
m G M is the current memory value, then the strategy chooses s' = a n (m,s) as the next state of the game. When the 
game leaves a state s G S, the memory is updated to a u (m,s). Formally, (M,mo,a u ,a„) defines the strategy A; such 
that A,(p -s) = a„(a u (mo,p) 7 s) for all p G S* and s G Si, where a„ extends a„ to sequences of states as expected. A 
strategy is memoryless if \M\ — 1, i.e., it does not depend on history but only on the current state of the game. We 
resp. denote by Ai,Af , and Af 1 the sets of general (i.e., possibly infinite-memory), finite -memory, and memoryless 
strategies for player Vi. 

A play % is said to be consistent with a strategy A; of Vi if for all n > such that Last(^(n)) G 5,-, we have 
Last(7r(n + 1)) = A,(^(n)). Given an initial state Sj n ; t G S, and two strategies, Ai for V\ and A2 for P 2 , the unique play 
from Sj n j t consistent with both strategies is the outcome of the game, denoted by Outcomec;(sj n j t , Ai,A2). 

Attractors. The attractor for Vi of a set A G S in G is denoted by Attr^ 1 (A) and computed as the fixed point of the 
sequence Attr^'" +1 (A) = Attr^ h "(A) U {s G 5i | 3(s,t) eE,te Attr^'"(A)}U{s G S 2 I V(s,f) G £, f G Attr^''"(A)}, 
with Attrg''°(A) = A. The attractor Attr^' (A) is exactly the set of states from which V\ can ensure to reach A no 
matter what V2 does. The attractor Attr^ 2 (A) for V2 is defined symmetrically. 

Objectives. An objective for Vi in G is a set of plays C Plays(G). A play % G Plays(G) is winning for an objective 
if 7T G <j). Given a game G and an initial state s\ ni t G 5, a strategy Ai of Vi is winning if OutcomeG(sinit, Ai, A2) G <j) for 
all strategies A2 of V2- Given a rational threshold vector v G <Q k , we define the infimum (resp. supremum) total-payoff 
(resp. mean-payoff) objectives as follows: 

- Totallnf G (v) = {% G Plays(G) | TP(^ ) > v} 

- TotalSup G (v) = {n G Plays(G) | lP{n) > v} 

- Meanlnf G (v) = {n G Plays(G) | MP(7 r) > v} 

- MeanSup G (v) = {n G Plays(G) | MP(^) > v} 
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Decision problem. Given a game structure G, an initial state Si n j t G S, and an inf./sup. total-payoff/mean-payoff objec- 
tive C Plays(G), the threshold problem asks to decide if V\ has a winning strategy for this objective. The threshold 
v can be taken equal to {0}^ (where {0}^ denotes the fe-dimension zero vector) w.l.o.g. as we transform the weight 
function w to b ■ w - a for any threshold |, a G Z k , b G N = N \ {0}. 

3 Mean-Payoff and Total-Payoff Objectives 

In this section, we discuss classical mean-payoff and total-payoff objectives. We show that while they are closely 
related in one dimension, this relation breaks in multiple dimensions. Indeed, we establish that the threshold problem 
for total-payoff becomes undecidable, both for the infimum and supremum variants. 

First, consider one-dimension games. In this case, memoryless strategies exist for both players for both objectives 
[21,8,1 1,13] and the sup. and inf. mean-payoff problems coincide (which is not the case for total-payoff). Threshold 
problems for mean-payoff and total-payoff are closely related as witnessed by Lemma 1 and both have been shown to 
be in NPHcoNP [28,12]. 

Lemma 1. Let G= (Si,S2,E,k,w) be a two-player game structure and s mlt G S be an initial state. Let A, B, C and D 
resp. denote the following assertions. 

A. Player V\ has a winning strategy for MeanSupgdO}*). 

B. Player V\ has a winning strategy for MeanlnfcdO}*). 

C. There exists a threshold v G Q k such that V\ has a winning strategy for Totallnfe(v). 

D. There exists a threshold v' G Q* such that V\ has a winning strategy for TotalSup G (V). 

For games with one-dimension (k = I) weights, all four assertions are equivalent. For games with multi-dimension 
(k > I) weights, the only implications that hold are: C => D =$> A and C => B => A. All other implications are false. 

The statement of Lemma 1 is depicted in Fig. 1: the only implications that extend to the multi-dimension case are 
depicted by solid arrows. 

A: 3Af N MeanSup G ({0}*) <^= = = = > D: 3v € Q k , 3Af 1= TotalSup G (v) 




B: 3Af h Meanlnfe^O}*) •*= = = = > C: 3v' e Q k , 3A, C 1= Totallnf G (v') 

Fig. 1: Equivalence between threshold problems for mean-payoff and total-payoff objectives. Dashed implications are 
only valid for one-dimension games. 

Proof. Specifically, the implications that remain true in multi-weighted games are the trivial ones: satifaction of the 
infimum version of a given objective trivially implies satisfaction of its supremum version, and satisfaction of infimum 
(resp. supremum) total-payoff for some finite threshold v G Q k implies satisfaction of infimum (resp. supremum) mean- 
payoff for threshold {0} k as from some point on, the corresponding sequence of mean-payoff infima (resp. suprema) 
in all dimensions t, 1 <t<k, can be lower-bounded by a sequence of elements of the form ^ with n the length of 
the prefix, which tends to zero for an infinite play. That is thanks to the sequence of total-payoffs over prefixes being a 
sequence of integers: it always achieves the value of its limit v(f ) instead of only tending to it asymptotically as could 
a sequence of rationals such as the mean-payoffs. This sums up to C => D => A and C => B => A being true even in the 
multi-dimension setting. 

In the one-dimension case, all assertions are equivalent. First, we have that infimum and supremum mean-payoff 
problems coincide as memoryless strategies suffice for both players. Thus, we add A B and D => B by transitivity. 
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Second, consider an optimal strategy for V\ for the mean-payoff objective of threshold 0. This strategy is such that 
all cycles formed in the outcome have non-negative effect, otherwise V\ cannot ensure winning. Thus, the total- 
payoff over any outcome that is consistent with the same optimal strategy is at all times bounded from below by 
—2 • (|S| — l)-W (once for the initial cycle-free prefix, and once for the current cycle being formed). Therefore, we 
have that Z? => C, and we obtain all other implications by transitive closure. 




(-1,-1,-1) 

Fig. 2: Satisfaction of supremum TP does not imply sat- Fig. 3: Satisfaction of infimum MP does not imply sat- 

isfaction of infimum MP. isfaction of supremum TP. 

For multi-weighted games, all dashed implications are false. We specifically consider two of them. 

1 . To show that implication D =>• B does not hold, consider the one-player game depicted in Fig. 2. Clearly, any finite 
vector v G for the supremum total-payoff objective can be achieved by an infinite memory strategy consisting 
in playing both loops successively for longer and longer periods, each time switching after getting back above the 
threshold in the considered dimension. However, it is impossible to build any strategy, even with infinite memory, 
that provides an infimum mean-payoff of (0,0) as the limit mean-payoff would be at best a linear combination of 
the two cycles values, i.e., strictly less than in at least one dimension in any case. 

2. Lastly, implication B => D failure in multi-weighted games can be witnessed in Fig. 3. Clearly, the strategy that 
plays for n steps in the left cycle, then goes for n steps in the right one, then repeats for n' > n and so on, is a 
winning strategy for the infimum mean-payoff objective of threshold (0,0,0). Nevertheless, for any strategy of 
V\ , the outcome is such that either (i) it only switches between cycles a finite number of time, in which case the 
sum in dimension 1 or 2 will decrease to infinity from some point on, or (ii) it switches infinitely and the sum of 
weights in dimension 3 decreases to infinity. In both cases, the supremum total-payoff objective is not satisfied for 
any finite vector v € Q 3 . 

All other implications can be deduced false as they would otherwise contradict the last two cases by transitivity. 

□ 

In multi-dimension games, recent results have shown that the threshold problem for inf. mean-payoff is coNP- 
complete whereas it is in NPflcoNP for sup. mean-payoff [27,26]. In both cases, V\ needs infinite memory to win, 
and memoryless strategies suffice for V2 [4,26]. When restricted to finite-memory strategies, the problem is coNP- 
complete [4,26] and requires memory at most exponential for V\ [6]. 

The case of total-payoff objectives in multi-weighted game structures has never been considered before. Surpris- 
ingly, the relation established in Lemma 1 cannot be fully transposed in this context. We show that the threshold 
problem indeed becomes undecidable for multi-weighted game structures, even for a fixed number of dimensions. 

Theorem 1. The threshold problem for infimum and supremum total-payoff objectives is undecidable in multi-dimen- 
sion games, for five dimensions. 

We reduce the halting problem for two-counter machines to the threshold problem for two-player total-payoff 
games with five dimensions. Counters take values (vi , V2) G N 2 along an execution, and can be incremented or decre- 
mented (if positive). A counter can be tested for equality to zero, and the machine can branch accordingly. We build 
a game with a sup. (resp. inf.) total-payoff objective of threshold (0,0,0,0,0) for V\, in which V\ has to faithfully 
simulate an execution of the machine, and V2 can retaliate if he does not. We present gadgets by which V2 checks that 
(a) the counters are always non-negative, and that (b) a zero test is only passed if the value of the counter is really zero. 
The current value of counters (vi, V2) along an execution is encoded as the total sum of weights since the start of the 
game, (vi,— vi,V2, — Vi, — V3), with V3 being the number of steps of the computation. Hence, along a faithful execution, 
the 1st and 3rd dimensions are always non-negative, while the 2nd, 4th and 5th are always non-positive. To check that 
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counters never go below zero, V2 is always able to go to an absorbing state with a self-loop of weight (0, 1, 1, 1, 1) 
(resp. (1,1,0,1,1)). To check that all zero tests on counter 1 (resp. 2) are faithful, V2 can branch after a test to an 
absorbing state with a self-loop of weight (1,0,1,1,1) (resp. (1,1,1,0,1)). Using these gadgets, V2 can punish an un- 
faithful simulation as he ensures that the sum in the dimension on which V\ has cheated always stays strictly negative 
and the outcome is thus losing (it is only the case if V\ cheats, otherwise all dimensions become non-negative). When 
an execution halts (with counters equal to zero w.l.o.g.) after a faithful execution, it goes to an absorbing state with 
weight (0,0,0,0, 1), ensuring a winning outcome for V\ for the total-payoff objective. If an execution does not halt, 
the 5th dimension stays strictly negative and the outcome is losing. 

Proof. From a two-counter machine (2CM) Ai, we construct a two-player game G with five dimensions and an 
infimum (equivalently supremum) total-payoff objective such that V\ wins for threshold (0,0,0,0,0) if and only if the 
2CM halts. 

A 2CM has two counters that can be incremented or decremented, and can test if their value is equal to zero (called 
zero test) and branch accordingly. The halting problem for 2CMs is undecidable [23]. Assume w.l.o.g. that we have a 
2CM Ai such that if it halts, it halts with the two counters equal to zero. 5 In the game we construct, V\ has to faithfully 
simulate the 2CM Ai. The role of V2 is to ensure that he does so by retaliating if it is not the case, hence making the 
outcome losing for the total-payoff objective. 

The game is built as follows. The states of G are copies of the control states of Ai (plus some special states 
discussed in the following). Edges represent transitions between these states. The payoff function maps edges to 5- 
dimensional vectors of the form (c\, — C\,C2, — C2,d), that is, two dimensions for the first counter C\, two for the 
second counter C2, and one additional dimension. Each increment of counter C\ (resp. C2) in Ai is implemented 
in G as a transition of weight (1,-1,0,0,-1) (resp. (0,0, 1,-1,-1). For decrements, we have weights respectively 
(—1,1,0,0,-1) and (0,0,-1,1,-1) for C\ and C2. Therefore, the current value of counters (vi , V2) along an execution 
of the 2CM Ai is represented in the game as the current sum of weights, (vi , — v\ , V2, — V2, — V3), with V3 the number of 
steps of the computation. The two dimensions per counter are used to enforce faithful simulation of non-negativeness 
of counters and zero test. The last dimension is decreased by one for every transition, except when the machine halts, 
from when it is incremented forever (i.e., the play in G goes to an absorbing state with self-loop (0,0,0,0, 1)). This is 
used to ensure that a play in G is winning iff Ai halts. 

We now discuss how this game G ensures faithful simulation of the 2CM Ai by V\ . 

- Increment and decrement of counter values are easily simulated using the first four dimensions. 

- Values of counters may never go below zero. To ensure this, we allow V2 to branch after every step of the 2CM sim- 
ulation to two special states, s# p_„ e „ and s 1 ^^, which are absorbing and with self-loops of respective weights 
(0, 1, 1, 1, 1) and (1,1,0,1,1). If a negative value is reached on counter C\ (resp. C2), V2 can clearly win the game 
by branching to state s],^^ (resp. as the total-payoff in the dimension corresponding to the negative 
counter will always stay strictly negative. On the contrary, if V2 decides to go to s\ topjleg (resp. ij ropjles ) when the 
value of C\ (resp. C2) is positive, then V\ wins the game as this dimension will be positive and the other four will 
grow boundlessly. So these transitions are only used if V\ cheats. 

- Zero tests are correctly executed. In the same spirit, we allow V2 to branch to two absorbing special states after 
a zero test, s posjero and s 2 posjero with self-loops of weights (1,0,1,1,1) and (1,1,1,0,1). Such states are used by 
V2 if V\ cheats on a zero test (i.e., pass the test with a strictly positive counter value). Indeed, if a zero test was 
passed with the value of counter C\ (resp. C2) strictly greater than zero, then the current sum (vi , — v\ , V2, — V2, V3) 
is such that — v\ (resp. — V2) is strictly negative. By going to s posjem (resp. s 2 pos ^ ero ), V2 ensures that this sum will 
remain strictly negative in the considered dimension forever and the play is lost for V\ . 

Therefore, if V\ does not faithfully simulate Ai, he is guaranteed to lose in G. On the other hand, if V2 stops 
a faithful simulation, V\ is guaranteed to win. It remains to argue that he wins iff the machine halts. Indeed, if the 
machine Ai halts, then V\ simulates its execution faithfully and either he is interrupted and wins, or the simulation 
ends in an absorbing state with a self-loop of weight (0,0,0,0, 1) and he also wins. Indeed, given that this state can 
only be reached with values of counters equal to zero (by hypothesis on the machine Ai, without loss of generality), 



5 This is w.l.o.g. as it suffices to plug a machine that decreases both counters to zero at the end of the execution of the considered 
machine. 
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the running sum of weights will reach values (0,0,0,0,n) where n grows to infinity, which ensures satisfaction of the 
infimum (and thus supremum) total-payoff objective for threshold (0,0,0,0,0). On the opposite, if the 2CM M does 
not halt, V\ has no way to reach the halting state by means of a faithful simulation and the running sum in the fifth 
dimension always stays negative, thus inducing a losing play for V\ , for both variants of the objective. 

Consequently, we have that solving multi-weighted games for either the supremum or the infimum total-payoff 
objective is undecidable. □ 

We end this section by noting that in multi-weighted total-payoff games, V\ may need infinite memory to win, 
even when all states belong to him (S2 = 0). Consider the game depicted in Fig. 2. As discussed in the proof of Lemma 
1, given any threshold vector v <E Q 2 , V\ has a strategy to win the supremum total-payoff objective: it suffices to 
alternate between the two loops for longer and longer periods, each time waiting to get back above the threshold in 
the considered dimension before switching. This strategy needs infinite memory and actually, there exists no finite- 
memory strategy that can achieve a finite threshold vector: the negative amount to compensate grows boundlessly with 
each alternation, and thus no amount of finite memory can ensure to go above the threshold infinitely often. 

4 Window Mean-Payoff Objective 

In one dimension, no polynomial algorithm is known for mean-payoff and total-payoff, and in multiple dimensions, 
total-payoff is undecidable. In this section, we introduce the window mean-payoff objective, a conservative approx- 
imation in which local deviations from the threshold must be compensated in a parametrized number of steps. We 
consider a window, sliding along a play, within which the compensation must happen. Our approach can be applied 
both to mean-payoff and total-payoff objectives. Since we consider/zn/fe windows, both versions coincide for threshold 
zero. Hence we state our results for mean-payoff. 

In Sec. 4.1, we define the objective and discuss its relation with mean-payoff and total-payoff objectives. We then 
divide our analysis into two subsections: Sec. 4.2 for one-dimension games and Sec. 4.3 for multi-dimension games. 
Both provide thorough analysis of the fixed window problem (the bound on the window size is a parameter) and the 
bounded window problem (existence of a bound is the question). We establish solving algorithms, prove complexity 
lower bounds, and study the memory requirements of these objectives. In Sec. 4.4, we briefly discuss the extension of 
our results to a variant of our objective modeling stronger requirements. 

4.1 Definition and comparison 

Objectives and decision problems. Given a multi-weighted two-player game G = (S\,S2,E,k,w) and a rational 
threshold v G Q k , we define the following objectives. 6 

- Given l mRX e No, the good window objective 



with e K {p,p + 1) the edge ( La st ( ft (/?)), Last (ft (/? + 1))), requires that for all dimensions, there exists a window 
starting in the first position and bounded by Z max over which the mean-payoff is at least equal to the threshold. 
- Given / max G No, the direct fixed window mean-payoff objective 




(1) 




(2) 



requires that good windows bounded by / max exist in all positions along the play. 
- The direct bounded window mean-payoff objective 

DirBndWMP G (v) = |tt | 3/ max > 0, n e DirFixWMP G (v,/ max )} 

asks that there exists a bound l max such that the play satisfies the direct fixed objective. 



(3) 



6 For brevity, we omit that n £ Plays(G). 
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- Given Z max e No, the fixed window mean-payoff objective 

FixWMP c (v,Z max ) = {tt I 3i > 0, jt(i» 6 DirFixWMP c (v,Z max )} (4) 

is the prefix-independent version of the direct fixed window objective: it asks for the existence of a suffix of the 
play satisfying it. 

- The bounded window mean-payoff objective 

BndWMP G (v) = {^|3/ max >0, e FixWMP G (v,/ ma x)} (5) 

is the prefix-independent version of the direct bounded window objective. 

For any vgQ* and Z max £ No, the following inclusions are true: 

DirFixWMP G (v,/ max ) C FixWMP G (v,/ max ) C BndWMP G (v), (6) 
DirFixWMP G (v,Z max ) C DirBndWMP c (v) C BndWMP c (v). (7) 

Similarly to classical objectives, all objectives can be equivalently expressed for threshold {0}* by modifying the 
weight function. Hence, given any variant of the objective, the associated decision problem is to decide the existence 
of a winning strategy for V\ for threshold {0} k . Lastly, for complexity purposes, we make a difference between 
polynomial (in the size of the game) and arbitrary (i.e., non-polynomial) window sizes. 

Notice that all those objectives define Borel sets. Hence they are determined by Martin's theorem [22]. 

Let % — sqS\S2 ■ ■ ■ be a play. Fix any dimension t, 1 <t <k. The window from position j to /, < j < f, is closed 
iff there exists /', j < j" < f such that the sum of weights in dimension t over the sequence Sj ... syi is non-negative. 
Otherwise the window is open. Given a position / in n, a window is still open in / iff there exists a position < j < f 
such that the window from j to / is open. Consider any edge (si,Si+\) appearing along %. If the edge is non-negative 
the window starting in ;' immediately closes. If not, a window opens that must be closed within Z max steps. Consider 
the first position i' such that this window closes, then we have that all intermediary opened windows also get closed 
by i', that is, for any i", i < i" < i', the window starting in i" is closed before or when reaching position i' . Indeed, the 
sum of weights over the window from i" to i' is strictly greater than the sum over the window from i to i', which is 
non-negative. We call this fact the inductive property of windows. 




Fig. 4: Fixed window is satisfied for Z max > 2, whereas even Fig. 5: Mean-payoff is satisfied but none of the 
direct bounded window is not. window objectives is. 

Illustration. Consider the game depicted in Fig. 4. It has a unique outcome, and it is winning for the classical mean- 
payoff objective of threshold 0, as well as for the infimum (resp. supremum) total-payoff objective of threshold — 1 
(resp. 0). Consider the fixed window mean-payoff objective for threshold 0. If the size of the window is bounded by 
1, the play is losing. 7 However, if the window size is at least 2, the play is winning, as in 53 we close the window in 
two steps and in 54 in one step. Notice that by definition of the objective, it is clear that it is also satisfied for all larger 
sizes. As the fixed window objective is satisfied for size 2, the bounded window objective is also satisfied. On the other 
hand, if we restrict the objectives to their direct variants, then none is satisfied, as from S2, no window, no matter how 
large it is, gets closed. 

Consider the game of Fig. 5. Again, the unique strategy of V\ satisfies the mean-payoff objective for threshold 0. It 
also ensures value — 1 for the infimum and supremum total-payoffs. Consider the strategy of V2 that takes the self-loop 
once on the first visit of S2, twice on the second, and so on. Clearly, it ensures that windows starting in s\ stay open 

7 A window size of one actually requires that all infinitely often visited edges are of non-negative weights. 
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for longer and longer numbers of steps (we say that V2 delays the closing of the window), hence making the outcome 
losing for the bounded window objective (and thus the fixed window objective for any Z max G No). This illustrates the 
added guarantee (compared to mean-payoff) asked by the window objective: in this case, no upper bound can be given 
on the time needed for a window to close, i.e., on the time needed to get the local sum back to non-negative. Note that 
7^2 has to go back to s\ at some point: otherwise, the prefix-independence of the objectives allows V\ to wait for V2 to 
settle on cycling and win. For the direct variants, V2 has a simpler winning strategy consisting in looping forever, as 
enforcing one permanently open window is sufficient. 

Relation with classical objectives. We introduce the bounded window objectives as conservative approximations 
of mean-payoff and total-payoff in one-dimension games. Indeed, in Lemma 2, we show that winning the bounded 
window (resp. direct bounded window) objective implies winning the mean-payoff (resp. total-payoff) objective while 
the reverse implication is only true if a strictly positive mean-payoff (resp. arbitrary high total-payoff) can be ensured. 

Lemma 2. Given a one-dimension game G— (Si, S2,E,w), the following assertions hold. 

(a) If the answer to the bounded window mean-payoff problem is YES, then the answer to the mean-payoff threshold 
problem for threshold zero is also YES. 

(b) If there exists £ > such that the answer to the mean-payoff threshold problem for threshold E is YES, then the 
answer to the bounded window mean-payoff problem is also YES. 

(c) If the answer to the direct bounded window mean-payoff problem is YES, then the answer to the supremum total- 
payoff threshold problem for threshold zero is also YES. 

( d) If the answer to the supremum total-payoff threshold problem is YES for all integer thresholds ( i.e., the total-payoff 
value is °°), then the answer to the direct bounded window mean-payoff problem is also YES. 

Assertions (a) and (c) follow from the decomposition of winning plays into bounded windows of non-negative 
weights. The key idea for assertions (b) and (d) is that mean-payoff and total-payoff objectives always admit mem- 
oryless winning strategies, for which the consistent outcomes can be decomposed into simple cycles (i.e., with no 
repeated edge) over which the mean-payoff is at least equal to the threshold and which length is bounded. Hence they 
correspond to closing windows. Note that strict equivalence with the classical objectives is not verified, as witnessed 
before (Fig. 5). 

Proof. Assertion (a). In the one-dimension case, sup. and inf. mean-payoff problems coincide. Let % G Plays(G) be 
such that % G BndWMP G (0). There exists i > such that the suffix of n starting in / can be decomposed into an infinite 
sequence of bounded segments (i.e., windows) of non-negative weight. Thus, this suffix satisfies the sup. mean-payoff 
objective as there are infinitely many positions where the total sum from i is non-negative. Since the mean-payoff 
objective is prefix-independent, the play % is itself winning. 

Assertion (b). Consider a memoryless winning strategy of V\ for the mean-payoff of threshold e > 0. Only strictly 
positive simple cycles can be induced by such a strategy. Consider any outcome n = GqGiG2... consistent with it. 
We claim that for any position j along this play, there exists a position j + 1, with / < Z max = (\S\ — 1) • (1 + \S\ ■ W), 
such that the sum of weights over the sequence p = 07 . . . C7 /+ / is non-negative. Clearly, if it is the case, then objective 
FixWMPG(v, /max) is satisfied and so is objective BndWMPG(v). Consider the cycle decomposition AC1C2 ■ ..C„B of 

this sequence obtained as follows. We push successively Ob, <Ti, onto a stack, and whenever we push a state that is 

already in the stack, a simple cycle is formed that we remove from the stack and append to the cycle decomposition. 
The sequence p is decomposed into an acyclic part (AUB) of length 8 at most (|5| — 1) and total sum at least —(\S\ — 
1) • W and simple cycles of total sum at least 1 and length at most \S\. Given the window size / max , we have at least 
(\S\ — 1)-W simple cycles in the cycle decomposition. Hence, the total sum over p is at least zero, which proves our 
point. 

Assertion (c). Consider a play % G DirBndWTP G (0). Using the same decomposition argument as for assertion 
(a), we have that the sequence of total sums takes infinitely often values at least equal to zero. Thus the limit of this 
sequence of moments bounds from below the limit of the sequence of suprema and is at least equal to zero, which 
shows that the supremum total-payoff objective is also satisfied by play %. 

8 The length of a sequence is the number of edges it involves. 
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Assertion (d). In one-dimension games, the value of the total-payoff (i.e., the largest threshold for which V\ has a 
winning strategy) is °° if and only if the value of mean-payoff is strictly positive [12]. Hence, we apply the argument 
of assertion (b), further noticing that the window open in position j is closed in at most l max steps for any j > 0, which 
is to say that the direct objective is satisfied. □ 



4.2 Games with one dimension 

We now study the fixed window mean-payoff and the bounded window mean-payoff objectives in one-dimension 
games. For the fixed window problem, we establish an algorithm that runs in time polynomial in the size of the game 
and in the size of the window and we show that memory is needed for both players. Note that this is in contrast to the 
mean-payoff objective, where V2 is memoryless even in the multi-dimension case (cf. Table 1). Moreover, the problem 
is shown to be P-hard even for polynomial window sizes. For the bounded window problem, we show equivalence with 
the fixed window problem for size (|5| — 1) • (\S\ ■ W + 1), i.e., this window size is sufficient to win if possible. The 
bounded window problem is then shown to be in NP n coNP and at least as hard as mean-payoff games. 



Algorithm 1 FWMP(G,/ m 



Algorithm 2 DirectFWMP(G,/„, 



Require: G = (S 1 ,S 2 ,E,w) and / max e N 

Ensure: W is the set of winning states for V\ for FixWM P G (0, l m 
n:=0;W:=9 
repeat 

WJ:= Direct FWMP(G,i max ) 

JC, := AttrJ 1 (W rf ") {attractor for V, } 

W := ffuw;,, ; G:= G L (S\W) ;n :=n + l 



return W 



Require: G = (Si ,S 2 , E, w) and / max 6 No 
Ensure: W d is the set of winning 

DirFixWMP G (<U max ) 

W gw := GoodWin(G,/ max ) 

UW gw =SotW gw = 0then 
W d := W gw 

else 

W d := DirectFWMP(G 1 W gw ,l m!a ) 
return W d 



states for V\ for 



Algorithm 3 GoodWin(G,/ max ) 

Require: G = (Si,S 2 ,E,w) and / max e N 
Ensure: W gw is the set of winning states for GWc(0, / max ) 
for all s e S do 
C («):=0 
for all i e { 1 , . . . , 4„ ax } do 
for all j 6 Si do 

C,(s) := max (s/)e£ {w( (sj)) +Q-i(J)} 
for all s s S 2 do 

C,(s) :=min ( „, )e£ {w((j,i'))+Ci_i(.s')} 
return W g „ := {s €S\3i,l<i< lmm,Q(s) > 0} 



Fixed window: algorithm. Given a game G — (Si,S2,E ,w) and a window size l mm € No, we present an iterative 
algorithm FWMP (Alg. 1) to compute the winning states of V\ for the objective FixWM Pg(0,Wx)- Its sketch is 
the following. Initially, all states are potentially losing for V\. The algorithm iteratively declares states to be winning, 
removes them, and continues the computation on the remaining subgame as follows. In every iteration, i) Di rectFWM P 
computes the set of states from which V\ can win the direct fixed window objective (they are obviously winning 
for the fixed window objective), thanks to the prefix-independence of the fixed window objective, the attractor to 
is also winning, Hi) since V2 must avoid entering this attractor, he loses some power, and we iterate on the remaining 
subgame (the restriction of G to a subset of states A C S is denoted G [ A). Thus states removed over all iterations 
are winning for V\. The key argument to establish correctness is as follows: when the algorithm stops, the remaining 
set of states W is such that V2 can ensure to stay in W and falsify the direct fixed window objective by forcing the 
appearance of one open window larger than Z max . Since he stays in W, he can repeatedly use this strategy to falsify the 
fixed window objective. Thus the remaining set W is winning for V2, and the correctness of the algorithm follows. 

The main idea of algorithm Direct FWMP (Alg. 2) is that to win the direct fixed window objective, V\ must be able 
to repeatedly win the good window objective, which consists in ensuring a non-negative sum in at most Z max steps. A 
winning strategy of V\ in a state s is thus a strategy that enforces a non-negative sum and, as soon as the sum turns 
non-negative (in some state s'), starts doing the same from s ! . It is important to start again immediately as it ensures 
that all suffixes along the path from s to s' also have a non-negative sum thanks to the inductive property of windows. 
That is, for any state s" in between, the window from s" to s' is closed. The set of states from which V\ can ensure 
winning for the good window objective is computed by subroutine Good Win (Alg. 3). Intuitively, given a state s E S 
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and a number of steps i > 1, the value C,-(s) is computed iteratively (from Cj_i(j)) and represents the best sum that V\ 
can ensure from s in exactly / steps. Hence, the set of winning states for V\ is the set of states for which there exists 
some ;', 1 < i < l max such that C,-(s) > 0. We state the correctness of GoodWin in Lemma 3. 

Lemma 3. Algorithm GoodWin computes the set of winning states of V\ for the good window objective in time 
Oi\S\ ■ \E\ -/max- V), withV = |~log 2 W], the length of the binary encoding of weights. 

Proof. Let W g C S denote the winning states for GW G (0,/ max ). We prove that (a) s G W g => s G GoodWin(G,/ max ), 
and (b) s G GoodWin (G,/ max ) =^s£ W g . 

We first consider case (a). From s, there exists a strategy of V\ that enforces a non-negative sum after I steps, for 
some /, 1 < / < /max- Hence, the value C/(s) computed by the algorithm is non-negative and s e GoodWin (G,Z max ). 

Case (b). Assume s 6 GoodWin(G,/ max ). By definition of the algorithm GoodWin, there exists some / < / max such 
that Ci(s) is positive. Consequently, taking the choice of / edges that achieves the maximum value defines a strategy 
for V\ that ensures a positive sum after I steps, hence closing the window started in s. That is, s e W g . 

It remains to discuss the complexity of GoodWin. Clearly, it takes a number of elementary arithmetic operations 
which is bounded by O {\S\ -\E\- / max ) to compute the set W gw . Each elementary arithmetic operation takes time linear 
in the number of bits V of the encoding of weights, that is, logarithmic in the largest weight W. Hence, the time 
complexity of GoodWin is O (\S\ ■ \E\ ■ Z max • V). □ 

Thanks to the previous lemma, we establish the algorithm solving the direct fixed window objective. 

Lemma 4. Algorithm DirectFWMP computes the set of winning states ofV\for the direct fixed window mean-payoff 
objective in time O (|5| 2 -\E\- Z max • V), with V = |~log 2 W], the length of the binary encoding of weights. 

Proof. Let W be the set of winning states for DirFixWMP G (0,/ max ), i.e., 

seW 3 Ai GAi, VA2 GA 2 , Outcome G (>,Ai,A2) e DirFixWMP G (0,/ max ). 

We first prove (a) s G DirectFWMP(G,/ max ) s G W, and then (b) seW^se DirectFWMP(G,/ max )- First of all, 
notice that DirectFWMP exactly computes the set of states Wd such that a non-negative sum is achievable in at most 
Ls steps, using only states from which a non-negative sum can also be achieved in at most l max steps (hence the 
property is defined recursively). 

Consider case (a). Let s G Wd- Consider the following strategy of V\ . 

1. Play the strategy prescribed by GoodWin until a non-negative sum is reached. This is guaranteed to be the case in 
at most / max steps. Let s' be the state that is reached in this manner. 

2. By construction of Wj, we have that s' G Wd- Thus, play the strategy prescribed by GoodWin in s'. 

3. Continue ad infinitum. 

We denote this strategy by X\ and claim it is winning for the direct fixed window objective, i.e., s G W. Indeed, 
consider any strategy of V2 and let % = Outcome G (s, Xi,X2). We have % = 0"iO"2 • • • O my O mi+ \ ...o m2 O m2+ \ ... with 
Vj > 0, Oj G S and 0"i = a,„ = s, such that all sequences p(n) = a,„ n . . . <J m „ +l are of length at most l mia + 1 (Z m a X 
steps) and such that all strict prefixes of p (n) are strictly negative and all suffixes of p (n) are positive. Indeed, starting 
in some state a mn , the strategy X\ keeps a memory of the current sum and tries to reach a non-negative value (using 
the strategy prescribed by GoodWin). As soon as such a value is reached in a state 0„ ln+l , the memory of the current 
sum kept by the strategy is reset to zero and the process is restarted. That way, for all j, m n < j < m n+ \, we have that 
the sum over the sequence from 07 to <J m „ +[ is non-negative, hence all intermediate windows are also closed. Thus, the 
window property is satisfied everywhere along the play %, starting in G\ = s, which proves that s 6 W. 

Case (b). Let X\ be a winning strategy of V\ for DirFixWMP G (0,/ max ). For any strategy X2 of V2, the outcome is 
a play % = C\G2 ■ ■ ■ with 0"i = s such that the window property is satisfied from all states. In particular, this implies, 
that for all 07, strategy X\ enforces a positive sum in at most / max steps, that is, 07 £ GoodWin(G,/ max ). Since it is the 
case for all states a,-, we have that V\ has a strategy to ensure a positive sum in at most Z max steps using only states 
from which this property is ensured. Therefore, we conclude that s G Wj. 

Again, the number of calls of this algorithm is at most the number of states \S\. Let Cgw denote the complexity of 
algorithm GoodWin. Then, the complexity of algorithm DirectFWMP is O (\S\ ■ C G w)- □ 
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Finally, we prove the correctness of the algorithm for the fixed window problem. 

Lemma 5. Algorithm FWMP computes the set of winning states ofV\ for the fixed window mean-payoff objective in 
time O (|5| 3 -\E\- / max • V), with V = |~log 2 W~\, the length of the binary encoding of weights. 

Proof Let W C 5 be the set of states that are winning for FixWM P G (0, /max), i.e., 

se W o 3Ai eAi, VA 2 eA 2 , Outcome G (i,Ai,A 2 ) G FixWM P G (0,/ max ). 

Note that since we set the threshold to be (w.l.o.g.), we may ignore the division by the window size / in eq. (1). We 
claim that FWMP(G,/ max ) = W. The proof is in two parts: (a) s G FWMP(G,i max ) => s G W, and (b) s G W => s E 
FWMP(G,/ max ). 

We begin with (a). Let (W d ) n ^° and (W a ,, r ) n - be the finite sequences of sets computed by the iterative algorithm. 
We have that FWM P(G, / max ) = \J„> W^ tr . For any n,n' such that n ^ n', we have that W£ tr r\W* tr = and W rf " nff/ = 
0. Moreover, for all n > 0, W d " C W a " ttr . Let s G FWMP(G,/ max ). There exists a unique n > such that s G W a "„ r . By 
construction, from s, V\ has a strategy to reach and stay in W% U W^ 1 U W a "^ 2 U . . . W® ttr and thus s is winning in the 
subgame G[(S\ W" t J r l ). However, V 2 still has the possibility to leave W% and reach the set W^ 1 U W a V r 2 U . . . W® ttr . 
Since the sequence is finite and Vi cannot leave W®, we have that at some point, any outcome is trapped in some set 
W™ , < m < n, in which V\ wins the direct fixed window objective. Let x be the length of the finite prefix outside 
W™ . The outcome satisfies the fixed window mean-payoff objective for ;' = x. Therefore, we have that s eW. 

Now consider (b). Let s G W be a winning state for FixWM P G (0, l max ). We claim that s G FWMP(G, Z max ). Suppose 
it is not the case and consider the sequences (W d )"^° and (W attr ) n - as before. We have that for all n > 0, s W£ tr . 
In particular, V2 can force staying in S trap = S \ U n >o ^attr when starting in s. Since the algorithm has stopped, we 
have that DirectFWMP(G [ 5 <rap ,Z max ) = 0. As algorithm DirectFWMP is correct, from all states of S tra p, Vt has a 
strategy to spoil the direct fixed window game, i.e., V2 can force a sequence of states such that there exists a position j 
along it for which the window starting in j stays open for at least (Z max + 1 ) steps, and such that this sequence remains 
in Strap- Therefore, V2 can force staying in S trap and seeing infinitely often such sequences, hence V\ is losing for the 
fixed window mean-payoff objective, which contradicts the fact that s G W. 

Finally, consider the complexity of the recursive algorithm FWMP. Notice that at least one state is declared win- 
ning at each iteration. The number of calls is thus at most the number of states \S\. Computing the attractor is linear in 
the number of edges \E\ < \S\ 2 . The overall complexity is thus 0(\S\ ■ {\E\+ Cow)), where Cow is the complexity of 
the D i rect FWM P algorithm. □ 

Fixed window: lower bounds. Thanks to the correctness of algorithm FWM P, we also deduce linear upper bounds 
(in \S\ ■ Imax) on the memory needed for both players (Lemma 6). Indeed, let s G S be a winning state for V\. A 
winning strategy X\ for V\ is to (a) reach the set of states W£ that are winning for the direct fixed window objective 
in the subgame restricted to states \ W^J r , then (b) repeatedly play the strategy prescribed by GoodWin in this 
subgame (i.e., enforce a non-negative sum in less than l max steps, see proof of Lemma 4). If V2 leaves for a lower 
subgame restricted to W a "„ r , n' < n, the strategy is to start again part (a) in this subgame. Part (a) is memoryless as it 
uses a classical attractor strategy. Part (b) requires to consider, for each state s' in the set computed by DirectFWMP, 
a number of memory states which is bounded by / max , as the only memory needed is to select the corresponding 
successor state that will maximize the C/(V ) value, for all possible values of I, the number of steps remaining to close 
a window. Similarly, V2 needs to be able to prevent the closing of a window repeatedly, and therefore also possibly 
needs Z max memory states for each state of the game. 

To illustrate that memory is needed by both players, consider the following examples. First, consider a game where 
all states belong to V\ and such that the play starts in a central state s and in s, there are three outgoing edges, towards 
three simple cycles C\, C 2 , and C3. All other states have only one outgoing edge. Cycle C\ is composed of six edges 
of successive weights 3,3,5,-1,-1 and —5. Cycle C 2 is 7,-1 and —9. Cycle C3 is 5,5 and —11. The objective is 
FixWM P G (0,/ max = 4). Clearly, from some point on, a winning strategy of V\ has to infinitely alternate between 
cycles in the following way: (C1C2C3)' . Any other alternation leads to a bad window appearing infinitely often: hence, 
the decision of V\ in s depends on the remaining number of steps to ensure a good window. Second, consider a similar 
game but with all states belonging to 'P 2 . Again, the initial state is central and there are two cycles C\ and C 2 such that 
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C\ is 1 followed by — 1, and C2 is —1,-1 and 2. The objective is FixWMPG(0,/ ma x = 3). If V2 is memoryless, both 
possible strategies induce a winning play for V\ . On the other hand, if V2 is allowed to alternate, he can choose the 
play (C1C2)' which will be losing for V\ as the window —1,-1,-1 will appear infinitely often. 

Lemma 6. In one-dimension games with a fixed window mean-payoff objective, memory is needed by both players 
and linear memory in the number of states times the window size is sufficient. 

Through Lemma 5, we have shown that the fixed window problem admits a polynomial (in \S\, V and l max ) algo- 
rithm. In Lemma 7, we prove that even for window size l max — 1 and weights { — 1,1}, the problem is P-hard. This is 
via a reduction from reachability games. By making the target states absorbing with a self-loop of weight 1, and giving 
weight — 1 on all other edges, we obtain the reduction, as reaching a target state is now the only way to ensure that 
windows close. 

Lemma 7. In two-player one-dimension games, the fixed window mean-payoff problem is P-hard, even for Z max = 1 
and weights {—1,1}. 

Proof. Let G r — (S\,S2,E) be an unweighted game with a reachability objective asking to visit (at least once) a state 
of the set R C S. We build the game G — (Si,S2,E', w) by (a) making the target states absorbing with a self-loop of 
weight 1, i.e., for all s G R, we have (s,s) G E' and w((s,s)) = 1, and (b) putting weight —1 on all other edges, i.e., for 
all edge (s,t) G E such that s R, we have (s,t) G E' and w((s,s)) = —1. We claim that V\ has a winning strategy in 
G r from a state s G S if and only if he has a winning strategy for the objective FixWM Pg(0, Wx = 1 ) in G from s G S. 
Indeed, it is clear that any outcome that never reaches the target set is such that all windows stay indefinitely open, and 
conversely, an outcome that reaches this set after n steps is winning for the fixed window objective with i = n. Since 
deciding the winner in reachability games is P-complete, this concludes our proof. □ 

Fixed window: summary. We sum up the complexity analysis of the fixed window problem in Theorem 2. 

Theorem 2. In two-player one-dimension games, (a) the fixed arbitrary window mean-payoff problem is decidable 
in time O (|5| 3 • \E\ ■ l max ■ V), with V = [log 2 W~\, the length of the binary encoding of weights, and (b) the fixed 
polynomial window mean-payoff problem is P-complete. In general, both players require memory, and memory of size 
linear in \S\ ■ l max is sufficient. 

Bounded window: algorithm. In the following, we focus on the bounded window mean-payoff problem for two- 
player one-dimension games. We start with two technical lemmas related to the classical supremum total-payoff 
threshold problem. Using these lemmas, we establish a NP n coNP algorithm to solve the bounded window prob- 
lem and, as a corollary, we get an interesting bound on the window size needed to win the fixed window problem if 
possible. 

The first technical lemma (Lemma 8) states that if V\ has a strategy to win the supremum total-payoff objective 
from some state s init , then he can force a non-negative sum from this state in at most (|5| — 1) • (|5| • W+ 1) steps, i.e., 
he wins the good window objective for this window size. 

Lemma 8. Let G = {S\,S2,E,w) be a two-player one-dimension game. IfV\ has a strategy to win for objective 
TotalSup G (0) from initial state s m \t G S, then V\ also has a strategy to win for the good window objective GWc(0, /max) 
froms m ;tforl mia = (\S\-l)-{\S\-W+l). 

This result is obtained by considering a memoryless winning strategy of V\ for the total-payoff and the decompo- 
sition in simple cycles of any consistent outcome where (a) either simple cycles are strictly positive, or (b) they are of 
value zero but preceded by a non-negative prefix. 

Proof. Let X\ G Af 1 be a memoryless winning strategy of V\ for TotalSup G (0). Our claim is that for all possible 
outcome % consistent with X\ starting in the initial state Sj n i t , there exists a prefix p of n of size at most / m ax such 
that the total sum of weights over p is non-negative. Let n be any outcome consistent with X\ and p\ its prefix of 
length — 1) • (\S\ ■ W+ 1). Consider the cycle decomposition (see the proof of Lemma 2) of p\\ A,Ci,Ca, ■ ■ ■ ,C m ,B, 
with A the prefix before the first cycle and B the suffix after the last cycle in p\. The total length of the acyclic part is 
|.4| + \B\ < \S\ — 1. We claim that there exists a prefix p of p\ such that the total sum of weights overp is non-negative. 
Consider the following arguments: 
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1. No cycle C in {Ci, . . . ,C m } can be strictly negative. Otherwise, since Xi is memoryless, V2 could force cycling in 
such a cycle forever and the play would be losing for the supremum total-payoff objective, which contradicts X\ 
being a winning strategy. 

2. Assume that there exists a cycle C in {Ci, . . . ,C m } such that the sum of weights over this cycle is zero. We define 
the high point of a cycle as the first state where the sum from the start of the cycle takes its highest value. Then, 
the prefix p of p\ up to this high point is non-negative and we are done. Indeed, assume it is not the case. Then, 
the running sum over the outcome n is strictly negative when reaching the high point, and stays strictly negative in 
all positions along the cycle C, by definition of the high point. Therefore, V2 can force cycling forever in C since 
X\ is memoryless and the outcome becomes losing for the total-payoff objective. 

3. So assume there are only strictly positive cycles in the cycle decomposition of pi, that is, they all have a total sum 
of value at least 1. The total sum over Ci,...,C m is at least equal to m. Since each cycle is of length at most \S\ 
and A U B is of length at most \S\ — 1, we have that the number of cycles m in the cycle decomposition of pi is 
at least ((|5| - 1) • {\S\ -W+ 1) - (\S\ - 1))/|5| = (|5| -l)-W. Given that the total sum over prefix A is at least 
— — 1) • W, we obtain that p = AC1C2 ■ ■ -C m is the desired prefix with a non-negative total sum, and its length 
isboundedby (|5| - 1) • (|5| • W+ 1). 

This concludes our proof. □ 

The second technical lemma (Lemma 9) shows that if V2 has a strategy to ensure that the supremum total-payoff 
from some state s; n j t is strictly negative, then he has a memoryless strategy to do so and any outcome % starting in Sj n j t 
and consistent with this strategy is such that the direct bounded window mean-payoff objective is not satisfied. 

Lemma 9. Let G = (Si,S2,E,w) be a two-player one-dimension game. If V2 has a spoiling strategy for objective 
TotalSup G (0) from initial state s lnlt £ S, then V2 has a strategy X2 G Aj to ensure that for all possible outcome 
% = ObCi . . . consistent with X2 starting in Oq = s lnlt , there exists a position i > such that for all window sizes I > 1, 
the total sum of weights on the window from (7, to <7 !+ / is strictly negative. 

Proof. By contradiction. Let X2 € A^ be a memoryless spoiling strategy for objective TotalSup G (0) from Si n j t G 5. 
Let % be a consistent outcome and assume that it does not respect the lemma, i.e., for all positions ;' > 0, there exists 
a window size I > 1 such that the window from a, to a,- + / is non-negative. Then the play n can be decomposed as a 
sequence of finite windows of non-negative weights. Hence, the total sum from Oo = s mit takes infinitely often values 
at least equal to zero and the limit of its suprema is non-negative. This is in contradiction to X2 being a winning strategy 
for-p 2 - □ 

Thanks to Lemma 8 and Lemma 9, we are now able to establish a NP n coNP algorithm (Alg. 4) to solve the 
bounded window mean-payoff problem on two-player one-dimension games. Lemma 10 states its correctness. 



Algorithm 4 BoundedProblem(G) 
Require: Game G = (Si,S2,E,w) 

Ensure: W bp is the set of winning states for V\ for the bounded window 
mean-payoff problem 
W b „ ■- 

L - UnbOpenWindow(G) 
while L^S\W bp do 

W bp := AttrJ 1 (S\L) 

L ■- UnbOpenWindow(G I (S\ W bp fj 
return W bp 



Algorithm 5 UnbOpenWindow(G) 
Require: Game G = (Si,S2,E,w) 

Ensure: L is the set of states from which V2 can force a position for which the 
window never closes 
p:=O;Lo:=0 
repeat 

L p+1 :=L„UAttr^ SXi;)) (NegSupTP(G I (S\L P ))) 

p:=p + l 
until L p = L p -i 
return L :—L p 



Algorithm BoundedProblem (Alg. 4) computes via a subroutine UnbOpenWindow the set of states from which 
V2 can force the visit of a position such that the window opening in this position never closes. Clearly, to prevent V\ 
from winning the bounded window problem, V2 must be able to do so repeatedly as the prefix-independence of the 
objective otherwise gives the possibility to wait that all such bad positions are encountered before taking the windows 
into account. Therefore, the states that are not in UnbOpenWindow(G), as well as their attractor, are winning for V\. 
Since the choices of V2 are decreased because of the attractor of V\ being declared winning, we compute this in several 
steps, adding new states to the set of winning states for V\ up to stabilization. 
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Now consider the subroutine UnbOpenWindow (Alg. 5). Its correctness is based on Lemma 9. Indeed, it computes 
the set of states from which V2 can force a position for which the window never closes. To do so, it suffices to compute 
the attractor for V2 of the set of states from which V2 can enforce a strictly negative supremum total-payoff. Routine 
NegSupTP returns this set of states in NPflcoNP complexity [12]. Again, we compute the fixed point of the sequence 
as at each iteration, the choices of V\ are reduced. 

The main idea of the correctness proof is that from all states in Wb p , V2 has an infinite-memory winning strategy 
which is played in rounds, and in round n ensures an open window of size at least n by playing the total-payoff strategy 
of V2 for at most n ■ \S\ steps, and then proceeds to round (n + 1) to ensure an open window of size (n + 1), and so on. 
Hence, windows stay open for arbitrary large periods and the bounded window objective is falsified. 

Lemma 10. Given a two-player one-dimension game G = (Si,S2,E,w), the algorithm BoundedProblem computes 
the set of winning states for V\ for the bounded window mean-payoff objective of thresholds in time C(|5| 2 • (\E\ +C)), 
where C is the complexity of algorithm NegSupTP, i.e., the complexity of computing the set of winning states in a two- 
player one-dimension supremum total-payoff game. Thus, algorithm BoundedProblem is in NPtlcoNP. 

Proof. It suffices to show that for all states in W/, p = BoundedProblem(G), there exists a winning strategy of V\, 
whereas for all states in S \ Wb p , there exists one of T>2. 

Consider a state s g Wb p . Consider (L'")o< m <„, the finite sequence of sets L that are computed by BoundedProblem, 
with Lq = UnbOpenWindow(G); and (W^,)o<m<«. the corresponding finite sequence of sets Wb p where W b ° p = is 
empty and W h n p = Wt, p is the returned set of winning states. For all m',m, < m' < m < n, we have that W™ p D W b m p ' 
and L m C L m> . By construction, there exists m, 1 < m < n such that s e W™ = Attr^ 1 (S \L m_1 ). In the subgame 
G [ ((S\L m ~ l )\W™~ 1 ), V\ has a memoryless [13] winning strategy for the supremum total-payoff objective. Hence, 
consider the strategy X\ of V\ which is to reach the set (S\L m ~ 1 ) (in at most \S\ steps) and then play the memoryless 
total-payoff strategy in the subgame. It is possible for V2 to force leaving this subgame for a lower subset W h m p C W™ 
with m' <m but since the sequence is finite, any outcome is ultimately trapped in some subgame G [ ((S\L m ") \W b m p ). 
Therefore, repeating the strategy X\ in each subgame ensures that after a finite number of steps (and hence a finite 
number of positions for which windows never close), a bottom subgame G i((S\L m )\W b m p ) is reached and, by 
Lemma 8, strategy X\ ensures satisfaction of the good window objective for Z max = (\S\ — 1) • (\S\ ■ W + 1) in this 
subgame. Moreover, since this strategy never visits states out of the bottom subgame, it ensures an inductive window 
from every state, regardless of the past. Hence, all intermediate windows are also closed and this strategy is winning 
for FixWMP G (0,Z max ) C BndWMP G (0) from the initial state s. The states that are only visited finitely often before 
reaching the bottom subgame have no consequence thanks to the prefix-independence of the bounded window mean- 
payoff objective. 

As for V2, consider a state s e S\Wb p . Consider {L p )o< P <q, the finite sequence of sets L that are computed in the 
last call to UnbOpenWindow by BoundedProblem, with Lq = 0. We define the sequences {N p )\< p < q and (A p )\< p <q 
as N p = NegSupTP(G [ (S\L p -\)) andA p = L p \L p -\ = Attr^^^ ^(N p ). We have that s G L p for some p between 
1 and q. An infinite memory winning strategy for V2 is played in rounds. In round n, V2 acts as follows, (a) If the 
current state is in A p , play the attractor to N p and then play the optimal strategy for the supremum total-payoff in A^, 
to ensure that no window will have a non-negative sum for n steps, (b) V\ can leave the set A p for some lower set A p r, 
1 < p' < P- If so, play the attractor to N p i and continue. Ultimately, any outcome is trapped in some set N p n \A p n_i, 
with 1 < p" < q and Aq = 0, as in A^i , V\ cannot leave. There T\ cannot prevent the window being strictly negative for 
n steps. When such a window has been enforced for n steps, move to round n + 1 and start again. This strategy ensures 
that the bounded window problem is not satisfied as, infinitely often, windows stay open for arbitrary large periods 
along any outcome. 

Finally, we discuss the complexity of algorithm BoundedProblem. Let C be the complexity of routine NegSupTP, 
that is, the complexity of solving a one-dimension supremum total-payoff game. The total complexity of subalgorithm 
UnbOpenWindow is 0(\S\ ■ (\E\ +C)) as the sequence of computations is of length at most \S\ and each computation 
takes time 0(\E\ + C). The overall complexity of BoundedProblem is thus 0(C + \S\ ■ {\E\ + \S\ ■ (\E\ + C))) = 
0(\S\ 2 -(\E\ + C)). □ 

An interesting corollary of Lemma 8 and Lemma 10 is that the sets of winning states coincide for objectives 
FixWMP G (0,/ max = {\S\ - 1) • • W+ 1)) and BndWMP G (0), therefore proving a NP n coNP membership for the 
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subset of fixed window problems with window size at least l max (hence an algorithm independent of the window size 
whereas Lemma 4 gives an algorithm which is polynomial in the window size). 

Corollary 1. In two-player one-dimension games, the fixed window mean-payoff problem is in NP d coNP for window 
size at least equal to (\S\ — 1) • (\S\ • W+ 1). 

Bounded window: lower bounds. Algorithm BoundedProblem (Lemma 10) provides memoryless winning strategies 
for V\ (attractor + memoryless strategy for total-payoff) and infinite-memory winning strategies for Vi (delaying the 
closing of windows for increasing number of steps each round) in one-dimension bounded window mean-payoff 
games. Lemma 1 1 states that infinite memory is necessary for V2, as discussed in Section 4.1: V2 cannot use the zero 
cycle forever, but he must cycle long enough to defeat any finite window. Hence, its strategy needs to cycle for longer 
and longer, which requires infinite memory. 

Lemma 11. In one-dimension games with a bounded window mean-payoff objective, (a) memoryless strategies suffice 
for V\, and (b) infinite-memory strategies are needed for V2 in general. 

In Lemma 14, we give a polynomial reduction from mean-payoff games to bounded window mean-payoff games, 
therefore showing that a polynomial algorithm for the bounded window problem would solve the long-standing ques- 
tion of the P membership of the mean-payoff threshold problem. The proof relies on technical lemmas providing 
intermediary reductions. First, we prove that given a game G, deciding if V\ has a strategy to ensure a non-negative 
mean-payoff can be reduced to deciding if V\ has a strategy to ensure a strictly positive mean-payoff when weights are 
shifted positively by a sufficiently small e (Lemma 12). Second, we apply Lemma 2 on the shifted game to prove that 
winning this objective implies winning the bounded window problem. This gives one direction of the reduction. For 
the other one, we show that given a game G, if V\ has a strategy to win the bounded window problem when weights 
are shifted positively by a sufficiently small e, he has one to win the mean-payoff threshold problem in G. 

We define the following notation: given a two-player one-dimension game G = (Si,S2,E,w) and £ £ Q, let G+ £ = 
(Si,S2,E,w+ e ) be the game obtained by shifting all weights by e, that is, for all e £ E, w +£ (e) = w(e) + e. 9 

Lemma 12. For all one-dimension game G = (Si,S2,E, w) with integer weights, for all £, < £ < 1 /\S\, for all initial 
state s £ S, V\ has a strategy to ensure a non-negative mean-payoff in G if and only ifV\ has a strategy to ensure a 
strictly positive mean-payoff in G+ £ . 

Proof. Consider a memoryless winning strategy of V\ in G from initial state s £ S. All simple cycles in consistent 
outcomes have a sum of weights at least equal to zero. Hence, the corresponding outcome in G +£ is such that all 
simple cycles of length n have sums at least equal to n ■ £ > 0, which proves that the strategy is also winning in G +e . 

Consider a memoryless winning strategy of V2 in G from initial state s £ S. All simple cycles in consistent out- 
comes have a strictly negative sum of weights, that is the sum is at most equal to —1. Hence, the corresponding 
outcome in G+ £ is such that all simple cycles of length n have sums at most equal to — 1 +n ■ £. Since n < \S\ and 
£ < 1 /\S\, we have that the sum is strictly negative, which proves that the strategy is also winning in G+ £ . 

By determinacy of mean-payoff games, we obtain the claim. □ 

Lemma 13. For all one-dimension game G = (Si,S2,E,w) with integer weights, for all £, < £ < 1 /\S\, for all initial 
state s £ S, if Pi has a strategy to win the bounded window mean-payoff problem in G +£ , then V\ has a strategy to 
win the mean-payoff threshold problem in G. 

Proof. Assume there exists a winning strategy of V\ for the bounded window mean-payoff problem in G +£ from 
initial state s £ S. By Lemma 2, assertion (a), we have that this strategy ensures a non-negative mean-payoff in G+ e . 
By shifting weights by — £, this can be equivalently expressed as (Prop. A) the existence of a strategy of V\ ensuring 
a mean-payoff at least equal to — £ in the game G. 

For sufficiently small values of £, that is for < £ < 1/|5|, we claim that (Prop. A) implies that (Prop. B) V\ has a 
strategy to ensure a non-negative mean-payoff in G. By contradiction, assume this implication is false, that is we have 
that (Prop. A) is true and (Prop. B) is not. It implies the following. 

9 Note that w+ e can be transformed into an integer valued function without changing the answers to the considered decision 
problems. 
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- (Prop. A) is true: V\ has a memoryless strategy to ensure that the mean-payoff is at least equal to —£, i.e., strictly 
greater than — 1/|5|. 

- (Prop. B) is false: V2 has a memoryless strategy to ensure that all simple cycles in consistent outcomes have a sum 
of weights at most —1. Hence, this strategy ensures a mean-payoff at most equal to — 1/|5|. 

Obviously, it is not possible to have both (Prop. A) true and (Prop. B) false for any initial state s G S, hence proving 
our claim. □ 

Lemma 14. The one-dimension mean-payoff problem reduces in polynomial time to the bounded window mean-payoff 
problem. 

Proof. Let G = (S\,S2,E ,w) be a game with integer weights, and Sj n i t e S be the initial state. Let e be any rational 
value such that < £ < l/\S\. We claim that the answer to the mean-payoff threshold problem in Gis YES if and only 
if the answer to the bounded window mean-payoff problem in G+ £ is Yes. 

The left-to-right implication is proved in two steps. Assume the answer to the mean-payoff threshold problem in G 
is YES. First, by Lemma 12, we have that V\ has a strategy to ensure a strictly positive mean-payoff in G +e . Second, 
by Lemma 2, assertion (b), this implies that the answer to the bounded window mean-payoff problem in G+ £ is YES. 

The right-to-left implication is straightforward application of Lemma 13. □ 

Remark 1. The reduction established in Lemma 14 cannot be reversed in order to solve bounded window mean-payoff 
games via classical mean-payoff games. Indeed, the reduction relies on the absence of simple cycles of value zero in 
the game G+ £ , which is not verified in general if the reduction starts from arbitrary bounded window mean-payoff 
games. Indeed it does not suffice to shift the weights symmetrically by — e to obtain an equivalent mean-payoff game, 
as witnessed by Fig. 4, for which any negative shift gives a game losing for the mean-payoff threshold problem, while 
the bounded window problem on the original game is satisfied. 

Bounded window: summary. We close our study of two-player one-dimension games with Theorem 3. 

Theorem 3. In two-player one-dimension games, the bounded window mean-payoff problem is in NP (1 coNP and at 
least as hard as mean-payoff games. Memoryless strategies suffice for V\ and infinite -memory strategies are required 
for V2 in general. 



4.3 Games with k dimensions 

In this section, we address the case of two-player games with multi-dimension weights. For the fixed window mean- 
payoff problem, we first present an EXPTIME algorithm that computes the winning states of V\. We also establish 
lower bounds on the complexity of the fixed window problem: we show that the problem is EXPTIME-hard (both 
in the case of fixed weights and arbitrary dimensions, and in the case of a fixed number of dimensions and arbitrary 
weights) for arbitrary window sizes, whereas it is PSPACE-hard for polynomial window sizes. We show that exponen- 
tial memory is both sufficient and necessary in general for both players, even for polynomial window sizes. For the 
bounded window mean-payoff problem, we prove non-primitive recursive hardness. 

Fixed window: algorithm. We start by providing an EXPTIME algorithm via a reduction from a fixed window mean- 
payoff game G = (S\,S2,E,k,w) to an exponentially larger unweighted co-Biichi game G c (where the objective of V\ 
is to avoid visiting a set of bad states infinitely often). 

Lemma 15. The fixed window mean-payoff problem over a multi-weighted game G reduces in exponential time to the 
co-Biichi problem on an exponentially larger game G c . 

Recall that a winning play is such that, starting in some position ; > 0, in all dimensions, all opening windows 
are closed in at most / max steps. We keep a counter of the sum over the sequence of edges and as soon as it turns 
non-negative (in at most l max steps), we reset the sum counter and start a new sequence (which also must become non- 
negative in at most / max steps). Hence, the reduction is based on accounting for each dimension the current negative 
sum of weights since the last reset, and the number of steps that remain to achieve a non-negative sum. This accounting 
is encoded in the states of G c = (S\,S L 2 ,E C ), as from the original state space S, we go to S c = S x ({—l max -W, . . . ,0} x 
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{1, . . • , 'max})*- We add bad states, reached whenever a window reaches its maximum size / max without closing. Clearly, 
a play is winning for the fixed window problem if and only if the corresponding play in G c is winning for the co-Biichi 
objective that asks that the set of bad states is not visited infinitely often, as that means that from some point on, all 
windows close in the required number of steps. 

Proof. Let G = {S\,S 2 ,E,k,w) be a game with objective FixWMP G ({0} fc ,/max G No) and initial state s m \ t G S. Let 
W denote the maximal absolute value of any edge in E. We construct the unweighted game G c = (S^,S>2,E C ) in the 
following way. 

- SI = (S\ x ({— W ■ Wx, ... ,0} x {1, . . - ,/max})^ U . . - States £i , . . . , g\s\ denote special added bad sta- 
tes, one for each of the original states si,.. .,s\s\ G S. The other states are built as tuples that represent (a) a visited 
state in G, (b) for each dimension, a couple modeling (b.l) the current sum of weights since the last time the sum 
in this dimension was non-negative, and (b.2) the number of steps that remain to reach a non-negative sum in this 
dimension (i.e., before reaching the maximum window size). 

- S c 2 = S 2 X ({-W ■ /max, ... ,0} X {1 , ... , /max})* 

- We construct the edges ((s a , (o^t]), - - - , (o* T*)), (s b , {a l bl % l b ), (o£, T k b )) ofE L as follows. For all (s a ,s b ) G E, 
let w e = w((s a ,s b j), we have 

• ((j a ,(o'J,T^),...,(a*,l^)),Q,) G E c , with g b the bad state associated to state s b , iff 3t, 1 < t < k such that 

x\ = 1 and a\ + w e (t) < 0, 
. ((s a , (aj X), ■ ■ ■ , i?l * k a)),M°l • • • . « O) e £ e iff Vf , 1 < r < k, we have 

* o£ + w e (t ) > -> a£ = 0, t£ = / max , 

* oJ + w e (0<0AiS>l->oJ = oS + w e (0,^ = ^-l, 

and we add edges (q, (s,-, (0,/ ma x, • • • , (0,/max)) to E c for all states G 5. 

Intuitively, the game G c is built by unfolding the game G and integrating the current sum of weights in the states 
of G c , as well as the number of steps that remain to close a window, both for each dimension separately. The game 
G c starts in the initial state (.Sinit, (0,/max), (0,/max)), and each time a transition (s,s') in the original game G is 
taken, the game G c is updated to a state (s\ (a 1 , T 1 ), . . . , (o k , T*)) such that (a) if the current sum becomes positive 
in a dimension f, the corresponding sum counter is reset to zero and the step counter is reset to its maximum value, 
/max, (b) if the sum is still strictly negative in a dimension t and the window for this dimension is not at its maximal 
size, the sum is updated and the step counter is decreased, and (c) if the sum stays strictly negative and the maximal 
size is reached in any dimension, the game visits the corresponding bad state and then, all counters are reset for all 
dimensions. 

We argue that a play % in G is winning for the fixed window mean-payoff objective if and only if the corresponding 
play % c in G c is winning for the co-Biichi objective asking not to visit the set Sg = {$i , . . . , } infinitely often. Indeed, 
consider a play n winning for objective FixWMP G ({0}* : ,/max)- By eq. (4), this play only sees a finite number of bad 
windows (windows that are not closed in / max steps in some dimension). By construction of G c , the corresponding 
play n c only visits the set Sg a finite number of times, hence it is winning for the co-Biichi objective. Now, let n c be 
a winning play for the co-Biichi objective. By definition, there exists a position i in % c such that all states appearing 
after position i belong to S \ Sg. It remains to prove that for any position j > i, for any dimension t, 1 < t < k, there 
is a valid window of size at most / max . Again we use the inductive property of windows. We know by construction 
that a reset of the sum happens in at most Z max steps, otherwise we go to a bad state. Assume j is a position with a 
sum counter of zero in some dimension t , and / is the next such position. Since resets are done as soon as the sum 
becomes non-negative, all suffixes of the sequence from j to / are non-negative. Hence, it is clear that for all position 
/', j < j" < /, the window from /' to / in dimension t is closed. Consequently, the corresponding play % in G is 
winning for the fixed window mean-payoff objective of threshold and window size /max- n 

As a direct corollary of this reduction, we obtain an EXPTIME algorithm to solve the fixed window mean-payoff 
problem on multi-dimension games, as solving co-Biichi games takes quadratic time in the size of the game [5]. 

Corollary 2. Given a two-player multi-dimension game G = (S\ , S 2 ,E, k, w) and a window size / m ax G No, the fixed 
window mean-payoff problem can be solved in time 0(\S\ 2 ■ (/max) 4 * ■ W 2 k ) via a reduction to co-Biichi games. 
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Proof. Lemma 15 uses a co-Biichi game which state space is of size 

Sx({-W- /max, ... ,0} X {1 , ... , / max })*| + \S\ = 0(\S\ • (W) 21 • W k 

The quadratic algorithm for co-Biichi games described in [5] implies the result. □ 

A natural question is whether a distinct algorithm is useful in the one-dimension case. Remark 2 notes that it is. 

Remark 2. The multi-dimension algorithm described in Corollary 2 yields a procedure which is polynomial in the 
size of the state space, the window size, and the largest weight for the subclass of one-dimension games, hence only 
pseudo-polynomial (i.e., exponential in V, the length of the encoding of weights), whereas Lemma 5 gives a truly 
polynomial algorithm. 

Fixed window: lower bounds. We first consider the fixed arbitrary window mean-payoff problem for which we 
show (i) in Lemma 16, EXPTIME-hardness for { — 1,0,1} weights and arbitrary dimensions via a reduction from 
the membership problem for alternating polynomial-space Turing machines (APTMs) [3], and (ii) in Lemma 17, 
EXPTIME-hardness for two dimensions and arbitrary weights via a reduction from countdown games [17]. 

Given an APTM M and a word £ G {0, 1}*, such that the tape contains at most cells, where p is a polynomial 

function, the membership problem asks to decide if M. accepts £. We build a fixed arbitrary window mean-payoff game 
G so that V\ has to simulate the run of M. on £, and V\ has a winning strategy in G if and only if the word is accepted 
by the machine. For each tape cell h G {1,2,... ,p(|£|)}, we have two dimensions, (h,Q) and (h, 1) such that a sum 
of weights of value —1 (i.e., an open window) in dimension (h,i), i G {0, 1} encodes that in the current configuration 
of A4, tape cell h contains a bit of value i. In each step of the simulation, V\ has to disclose the symbol under the 
tape head: if in position h, V\ discloses a (resp. a 1), he obtains a reward 1 in dimension (h,0) (resp. (h, 1)). To 
ensure that V\ was faithful, V% is then given the choice to either let the simulation continue, or assign a reward 1 in 
all dimensions except (h,0) and (h, 1) and then restart the game after looping in a zero self-loop for an arbitrary long 
time. If V\ cheats by not disclosing the correct symbol under tape cell h, V2 can punish him by branching to the restart 
state and ensuring a sufficiently long open window in the corresponding dimension before restarting (as in Fig. 5). 
But if V\ discloses the correct symbol and V2 still branches, all windows close. In the accepting state, all windows 
are closed and the game is restarted. The window size Z max of the game is function of the existing bound on the length 
of an accepting run. To force V\ to go to the accepting state, we add an additional dimension, with weight — 1 on the 
initial edge of the game and weight 1 on reaching the accepting state. 

Lemma 16. The fixed arbitrary window mean-payoff problem is EXPTIME-hard in multi-dimension games with 
{ — 1,0,1} weights and arbitrary dimensions. 

Proof. An alternating Turing machine (ATM) [3] is a tuple M = (Q,qo,E m ,8,q acc ) where: 

- Q is the finite set of control states with a partition (gv, 2a) of Q into existential and universal states; 

- qo e Q is the initial state; 

- E- ln = {0, 1} is the input alphabet and E tape = E\ n U {#} the tape alphabet, with # the blank symbol; 

- 8 C Q x I^ape x Q x Etape x{ — l,l}isa transition relation; 

- there is a special accepting state q acc G Q y (without loss of generality). 

We say that M. is a polynomial-space alternating Turing machine (APTM) if for some polynomial function p, the 
space used by M on any input word £ G E* n is bounded by p ( | £ | ) . 

We define the AND-OR graph of the APTM (M,p) on the input word £ € E* n as Q{M,p) = (S v ,S A ,s ,A,R) 
where 

- S v = {(q,h,t)\qe QvA<h< p(\Q) and t G E^}; 

- S A = {(q,h,t)\qe Q,A<h< and t G E^}; 

- s = (q Q ,l,t) where f = £.#/HKIH?l ; 

- {{q\,h\,t\), (#25^2^2)) G ^ iff there exists (qi,t\ (hi),q,y,d) E 8 such that qi = q, hi = hi +d, ti(h\) = J and 
t 2 (h) =h(h) forall/z^/zi; 
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- R = {(q,h,t) eS v \q = q 3CC }. 



Intuitively, states of the graph correspond to configurations (q,h,t) where q is a control state of the machine, h the 
position of the tape head, and t the current word written on the tape. Given a state q of the machine A4, tape head 
on cell h and a word t on the tape, a transition from (q,h,t) to (q',h',t') exists in the graph Q(M,p) if the transition 
relation 8 of the machine M admits a transition that given this configuration, updates the content of cell h to the 
symbol t'(h), such that the tape now contains the word t', and then goes to control state q' and moves the tape head to 
an adjacent cell h! . 

A word £ e E* n is accepted by an APTM [M , p) if there exists a run tree (obtained by choosing a child in existential 
nodes and keeping all children in universal nodes) of M on £ such that all leafs are accepting configurations. That is, 
a word is accepted if and only if, in the two-player game defined by Q(M,p), player P y has a strategy to reach the set 
of accepting states R. Deciding the acceptance of a word by an APTM is an EXPTIME-complete problem, known as 
the membership problem [3]. 

We construct a fixed window mean-payoff game G = (S\,S2,E,k,w) simulating the machine (M,p) as follows. 
Let k = 2 ■ p(|CI) + 1: there is a dimension for each pair (h,0) and (h,l), for all 1 < h < p(\Q), and one additional 
dimension. The set of states S of the game is 

S = {<? restart} U {q ln } U {q^c} 

U{(q,h)\qeQ,l<h<p(\C\)} 
U{( 9 , VWck \q G Q,l<h< p(\C\),i G {0, 1}} 
U{(q,h) branch \qeQ, l<h<p(\Q)} 
U{(q,h,i) \q G Q, 1 < h < p(\Q),i G {0, 1}}. 

States of the form (q,h) belong to V\. States of the form (q,h,i) belong to V\ if q G Qy in the machine M.. All other 
states belong to Vi. The initial state is <7 re5 tart- It has two outgoing edges with weights zero in all dimensions: one 
self-loop, and one edge to q m . The latter is assigned the following weights: —1 for dimension (h,i) if the letter at 
position h of £ is ;, —1 in the very last dimension (2 • + 1), and zero everywhere else. From q ln , the game goes 

to (qo,l) and the simulation of M begins. 




Transitions of (g,0) 



Transitions of (q, 1) 



Fig. 6: Gadget ensuring a correct simulation of the APTM simulation on tape cell h. 

The game mimics runs of M, and it is ensured that if the current state of the game is (q,h) and the cell content 
is i, then the sum of weights since the last visit of q ln in dimension (h,i) is —1. We refer to the segment of play since 
the last visit of q m as the current round. We depict a step of the simulation in Fig. 6. At state (q,h), V\ has the choice 
between states (q,h, 0) cri eck and (q,h, l) c heck, resp. corresponding to declaring a content or 1 of the tape cell h. The 
reward for dimension (h,i), i G {0, 1} is 1 on state (g,/i,i') c heck- At state (#,/*,z') c heck, a state of V2, V2 checks whether 
V\ has correctly revealed the tape content as follows: (i) Player V2 can choose to go to state (g,/z)branch, in which all 
dimensions other than (h,0) and (h, 1), including the very last, are increased by 1, and then go to ^restart on which V2 
will be able to delay the play; (ii) Player V2 can choose to proceed and continue the simulation: the game then goes 
to state (q,h,i). State (q,h,i) is either a state of V\ or V2, depending on the affiliation of state q in the APTM. Such a 
gadget ensures that if V\ cheats by not disclosing the correct symbol, V2 can force an open window of arbitrary length 
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in the current round by looping on ^restart for some time, and then restarting the game. On the other hand, if V\ is 
faithful and V2 still decides to branch to (g,/i)branch, then all windows will be closed for the current round. 

If V\ does not cheat and V2 acknowledges it by not branching, the game advances to a state of the form (q,h,i). 
At such a state, we add transitions as follows: if there exists a transition from (q,h,i) to (q' ,h',f) in A4, then we add 
an edge from (q,h,i) to (q',h') in the game G, and assign weight —1 in dimension (h,i'), as the tape cell at position h 
contains i' and we ensure that the sum in dimension (h, ;') in the current round is — 1 . At the accepting states (q acc ,h), 
all dimensions are assigned reward 1, and the next state is q^c- State q^ is followed by ^restart- Again there is no risk 
in looping as all dimensions are now non-negative. 

Formally, blank symbols need to be added. For brevity and simplicity of the presentation, we omit these technical 
details. 

We fix the window size l max equal to three times the size of the configuration graph (bound on the length of a 
run) plus three, and we argue that the game G is a faithful simulation of the machine M, that is, V\ wins the fixed 
window mean-payoff game if and only if the word £ is accepted by Ai. Notice that the construction ensures that if 
V\ cheats in the current round, V2 can make this round losing, as discussed before. Similarly, if V\ does not cheat 
but does not reach the accepting state, dimension 2 + 1 will remain negative when arriving in g restart and V2 

will be able to cycle long enough to make the round losing as the window in the last dimension will remain open for 
'max steps. Clearly, V\ cannot see losing rounds infinitely often otherwise the play is losing. Assume the word £ is 
accepted by the machine. Then there is an accepting run tree, and the winning strategy of V\ is to follow this run tree 
and always reveal the correct symbol. This way, either V2 restarts and the round is winning because all dimensions are 
non-negative, or V2 does not restart and an accepting state (q acc ,h) is reached within the maximum allowed window 
size. Indeed, in the APTM, there is a strategy to reach the accepting state in a number of steps bounded by the size of 
the configuration graph. In that case, the round is also winning. Conversely, assume that the word £ is not accepted 
by the APTM. Consider any strategy X\ of V\. Clearly, V\ cannot cheat as otherwise, he loses. So assume he does not 
cheat. Then there is a path in the run tree obtained from playing the strategy X\ in M. such that the path never reaches 
an accepting state. Hence, the strategy X2 of V2 that follows this path in the game G ensures that the sum in dimension 
2 •/?(!£!) + 1 is always strictly negative, and after waiting till the bound / max on the window size is met, V2 has made 
the round losing and he can restart the game safely. Acting this way infinitely often, V2 can violate the fixed window 
objective for V\ . It follows that V\ wins in G if and only if the word £ is accepted by the APTM M. □ 

We now prove EXPTIME-hardness for two dimensions and arbitrary weights via a reduction from countdown 
games. A countdown game C consists of a weighted graph (5,7"), with S the set of states and T C S x No x S 
the transition relation. Configurations are of the form (s,c), s G S, c G N. The game starts in an initial configuration 
(•SinitjCo) and transitions from a configuration (s,c) are performed as follows: first V\ chooses a duration d, < d < c 
such that there exists t = (s,d,s') G 7" for some s' G S, second V2 chooses a state s' G S such that t = (s,d,s') G T. 
Then, the game advances to (s 1 , c — d) . Terminal configurations are reached whenever no legitimate move is available. 
If such a configuration is of the form (s,Q), V\ wins the play. Otherwise, V2 wins the play. Deciding the winner in 
countdown games given an initial configuration (sinit,co) is EXPTIME-complete [17]. 

Given a countdown game C and an initial configuration (s m -,t,co), we create a game G=(Si,S2,E,k,w) with k = 2 
and a fixed window objective for Z max = 2 • cq + 2. The two dimensions encode the value of the countdown counter: 
one copy its value, the other its opposite. Each time a duration d is chosen, an edge of value of value (—d,d) is taken. 
The game simulate the moves available in C: a strict alternation between states of V\ (representing states of S) and 
states of V2 (representing transitions available from a state of S once a duration has been chosen). On states of V\, we 
add the possibility to branch to a state ^restart of V2, in which V2 can either take a zero cycle, or go back to the initial 
state and force a restart of the game. By placing weights (0, — cq) on the initial edge, and (co,0) on the edge branching 
to ^restart, we ensure that the only way to win for V\ is to accumulate a value exactly equal to cq in the game before 
switching to ^restart- This is possible if and only if V\ can reach a configuration of value zero in C. 

Lemma 17. The fixed arbitrary window mean-payoff problem is EXPTIME-hard in multi-dimension games with two 
dimensions and arbitrary weights. 

Proof. We establish a polynomial-time reduction from the countdown game problem to the fixed arbitrary window 
problem. Let C = (5,7") be a countdown game [17], with initial configuration (jinit^o). We create a corresponding 
game G = (Si,S2,E,k,w) as follows. 
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- Si=S. 

- Let S r C S x No be the subset of pairs such that there exists a transition (s,d,s') G T. Then, S2 = 5 r U 
{■Srestart}- State s re5 tart is the initial state of game G. 

- For each transition (s,d,s') G T, we add edges (s, (s,c/)), with s G Si and G 52, and ((s,d),s'), with s' G Si, 
to the set of edges E. Edge (s, (s,d)) has weight (—d,d) and edge ((s,d),s') has weight (0,0). 

- For all s G Si, we add an edge (s,.s re start) of weight (co,0). 

- From ^restart, we add an edge (Restart, *init) °f vame (0, -co). 

- On .Restart, We add a Self-loop (Srestart,*restart) of Weight (0,0). 

We fix the window size Z max = 2 • co + 2, and we claim that V\ wins the fixed window problem if and only if he wins 
the countdown game. Recall that to win a countdown game, V\ must be able to reach a configuration (s, 0) in the game 
C. The key idea to our construction is that in the game G, the only way to avoid seeing infinitely often open windows 
of size larger than Z max is to accumulate exactly cq before restarting, which is equivalent to reaching a configuration of 
value in C. 

Notice that the game G starts by visiting an edge of value (0, -co) and afterwards, all edges from states of Vi have 
a value (—d,d) corresponding to the duration he chooses in the countdown game. All except the edge he can decide 
to take to go to s res tart> which value is (co,0). Clearly, if V\ decides to go in s res tart> he has to close all windows, as 
otherwise Vi can use the self-loop to delay the play long enough and provoke a sufficiently long bad window, which 
if done repeatedly, induces a losing play. On the other hand, if V\ decides to never go towards s re start, he will keep 
accumulating negative values in the first dimension and he is guaranteed to lose. So obviously the behavior of V\ 
should be to play as in the countdown game to accumulate exactly cq in dimension 2 (and — cq in dimension 1) before 
switching to s re start, so that V2 can do no harm by delaying the play as all windows will be closed. The accumulated 
value has to be exactly cq as (a) if it is less than cq, dimension 2 will remain negative, and (b) if it is more than co, 
dimension 1 will stay negative (i.e., the edge (s,s re start) will not suffice to get it back above zero). Since the minimal 
increase is of 1 every two edges by construction, the allowed window size Z max is sufficient to enforce such a behavior, 
if possible. This shows that V\ wins the fixed window problem from initial state s r e 5 tart in G if and only if he also wins 
the countdown game C from (s] n ; t ,co), as accumulating co in G is equivalent to reaching a configuration of value in 
C. " a 

For the case of polynomial windows, Lemma 18 proves PSPACE-hardness via a reduction from generalized reach- 
ability games [10]. Filling the gap with the EXPTIME membership given by Corollary 2 is an open problem. The 
generalized reachability objective is a conjunction of reachability objectives: a winning play has to visit a state of each 
of a series of k reachability sets. If V\ has a winning strategy in a generalized reachability game G r = (S[,S 2 ,E r ), 
then he has one that guarantees visit of all sets within k- \S r \ steps. We create a modified weighted version of the 
game, G = (Si,S2,E,k,w), such that the weights are fe-dimension vectors. The game starts by opening a window in 
all dimensions and the only way for V\ to close the window in dimension t, 1 < t < k is to reach a state of the f-th 
reachability set. We modify the game by giving V2 the ability to close all open windows and restart the game such 
that the prefix-independence of the fixed window objective cannot help V\ to win without reaching the target sets. 
Then, a play is winning in G for the fixed window objective of size / max = 2 • k ■ \S r \ if and only if it is winning for the 
generalized reachability objective in G r . 

Lemma 18. The fixed polynomial window mean-payoff problem is PSPACE-hard. 

Proof. We show the PSPACE-hardness by a reduction from the generalized reachability problem [10]. Given a game 
graph G r = (S\ 1 S r 2 1 E r ), a series of reachability sets R t C S r , for 1 < t < k, with k < \S r \, and an initial state s' inlt G S r , 
the generalized reachability problem asks if there exists a strategy of V\ such that any consistent outcome starting in 
jf nit visits a state of each set R t at least once. It is known that if such a strategy exists, then there exists one which 
ensures reaching all sets in at most k ■ \S r \ steps. 

We build a fe-dimension fixed window mean-payoff game G = (S\,S2,E,k,w) as follows. We define Stanch C S2, 
the set of V2 states such that for all s,s' G S r such that (s,s') G E r , we have that b ss i G Stanch- L et Si = S\ and S2 = 
S r 2 USbranch u {^restart}- Let E be the set of edges such that for all (s,s r ) G E r , we have that (s,b s<s i) G E, (b s j,s') G E, 
(b s ^,s restart) G E, and such that (srestart^init) G E. That is, we introduce in all edges of E r a state of V2 that let him 
branch to an added state s mstalt or continue as in G r . The new initial state in G is ^restart, and there is an edge from 
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Restart to the old initial state s? n]v The weights are as follows: all edges from states b ss t to i res tart have value 1 in all 
dimensions. The edge from Srestart to s- nlt has value —1 in all dimensions. All other edges of the game have value zero, 
except edges entering a state that belongs to a reachability set R t , which have value 1 in dimension t and in the other 
dimensions. If a state belongs to several sets, then all corresponding dimensions get a 1. 

We claim that V\ has a winning strategy for FixWM PedOj'ymax — 2-k- \S r \) if and only if he has a winning 
strategy for the generalized reachability objective in G r . Consider the game G. Clearly, the only edge involving negative 
values is (s r estart,*[ n it)> which value is (— 1, . . . , — 1). Therefore, a losing play for eq. (4) should see this edge infinitely 
often, as it is the starting position of all open windows. Notice that on the other hand, going from a state b s s i to s re start 
involves an edge of value ( 1 , . . . , 1 ), hence if the open window starting in s resta rt comes back in s res tart before hitting its 
maximal size, the window will close. So the strategy of V2 should be to wait for l max = 2 ■ k ■ \S r \ steps before forcing a 
restart. Now, consider a winning strategy X\ of V\ in G. Because of the strategy of V2, Ai has to ensure obtaining +1 
in all dimensions by only using transitions entering in states of S r . By construction, this implies that Ai enforces a visit 
of all reachability sets, and thus is winning for the generalized reachability problem. Consider the reverse implication. 
Let Af be a winning strategy in G r . There exists such a strategy that ensures seeing all reachability sets (thus closing 
all windows) in at most / max = 2 • k ■ \S r \ steps if V2 does not branch to ^restart- On the other hand, if V2 does branch 
before l max steps, all windows also close, as branching edges have value (1, . . . , 1). Hence, this strategy is also winning 
for FixWM Pc({0} k , 'max)- This shows the correctness of the reduction and concludes our proof. □ 




Fig. 7: Family of games requiring exponential memory: VI < i < K, VI < j < k, w((si,Si^))(j) = 1 if j = 2 ■ i — 1, 
= -1 if j = 2-i, and = otherwise; w((si,s LL j) = -w( (*,-,*,•,/;)) = w((tj,t LL )) = -w((ti,t LR )); w((o,Sj)) = w((o,f,-)) = 
(0,...,0). 

We conclude our study of the multi-dimension fixed window problem by considering memory bounds. A direct 
corollary of Lemma 15 is the existence of winning strategies of at most exponential size for both players, as memory- 
less strategies are sufficient in co-Biichi games [9]. A corollary of the reduction from generalized reachability games to 
the fixed polynomial window problem used to prove Lemma 18 and the results of [10, Lemma 2] (showing exponen- 
tial lower bounds on memory for generalized reachability objectives) is that such memory is needed in general, again 
for both players. Another example of a family of games in which V\ requires exponential memory (in the number of 
dimensions) is given by the family defined in [6, Lemma 6] (Fig. 7), introduced in the context of multi energy games. 
All examples have in common that the players must be able to differentiate between an exponential number of histories 
and act accordingly to achieve their objective: in the game of Fig. 7, V\ wins objective FixWM Pg({0}*, / max = \S\/2) 
only if he is able to make in f, the opposite choice of V2 in s,, which requires a strategy encoded as a Moore machine 
with at least 2 k l 2 states. Lemma 19 sums up these results. 

Lemma 19. In multi-dimension games with a fixed window mean-payoff objective, exponential memory is both suffi- 
cient and necessary for both players in general, even for polynomial window sizes. 

Fixed window: summary. We summarize the complexity of the fixed window problem in Theorem 4. 

Theorem 4. In two-player multi-dimension games, the fixed arbitrary window mean-payoff problem is EXPTIME- 
complete, and the fixed polynomial window mean-payoff problem is PSPACE-hard. For both players, exponential 
memory is sufficient and is required in general. 

Bounded window. Unlike the one-dimension case, in which it is easier to decide the bounded problem than the fixed 
arbitrary one (i.e., the problem becomes easier when the fixed window size is sufficiently large), we prove that the 
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complexity of the bounded window problem in multi-weighted games is at least non-primitive recursive. Hence, 
there is no hope for efficient algorithms on the complete class of two-player multi-weighted games. Decidability of 
the bounded problem is of pure theoretical interest and is left open. 

This result is obtained through a reduction from the problem of deciding the existence of an infinite execution in a 
marked reset net, also known as the termination problem. A marked reset net [7] is a Petri net with reset arcs together 
with an initial marking of its places. Reset arcs are special arcs that reset a place (i.e., empty it of all its tokens). The 
termination problem for reset nets is decidable but non-primitive recursive hard (as follows from the results of [25], 
also discussed in [19]). 



(0,0,1,1) (1^ ,1,-1,1) 




(I,-*, 1,1,-1) (0,0,1,1) 



Fig. 8: Careful alternation between gadgets is needed in order for V\ to win. 

Given a reset net Af with an initial marking mo S N'^' (where P is the set of places of the net), we build a two-player 
multi-weighted game G with k = \P\ + 3 dimensions such that V\ wins the bounded window objective for threshold 
{0} k if and only if J\f does not have an infinite execution from Tmq. 

A high level description of our reduction is as follows. The structure of the game (Fig. 8) is based on the alternance 
between two gadgets simulating the net (Fig. 9). Edges are labeled by ^-dimension weight vectors such that the first |P| 
dimensions are used to encode the number of tokens in each place. In each gadget, V2 chooses transitions to simulate an 
execution of the net. During a faithful simulation, there is always a running open window in all the first |P| dimensions: 
if place p contains n tokens then the negative sum from the start of the simulation is — (n+ 1). This is achieved as 
follows: if a transition t consumes 1(f) (p) tokens from p, then this value is added on the corresponding dimension, 
and if t produces 0(t)(p) tokens in p, then 0(t)(p) is removed from the corresponding dimension. When a place p 
is reset, a gadget ensures that dimension p reaches value —1 (the coding of zero tokens). If all executions terminate, 
V2 has to choose an unfireable transition at some point, consuming unavailable tokens from some place p e P. If 
so, the window in dimension p closes. After each transition choice of V%, V\ can either continue the simulation or 
branch out of the gadget to close all windows, except in some dimension p of his choice. Then V2 can arbitrarily 
extend any still open window in the first (|P| + 1) dimensions and restart the game afterwards. Dimension (|P| + 1) 
prevents V\ from staying forever in a gadget. If an infinite execution exists, V2 simulates it and never has to choose 
an unfireable transition. Hence, when V\ branches out, the window in some dimension p stays open. The last two 
dimensions force him to alternate between gadgets so that he cannot take profit of the prefix-independence to win after 
a faithful simulation. So, V2 can delay the closing of the open window for longer and longer, thus winning the game. 

Theorem 5. In two-player multi-dimension games, the bounded window mean-payoff problem is non-primitive recur- 
sive hard. 

Proof. We prove a reduction from the termination problem on reset ne ts to the bounded window problem on two-player 
multi-weighted games. The former is known to be non-primitive recursive hard [25,19]. 

10 That is, there exists no primitive recursive function that computes the answer to the bounded window problem. A well-known 
example of a decidable but non-primitive recursive function is the Ackermann function [1]. 
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restart 
(-rHo- 1,0,0,0) 




(0^1,-1,0,0) 



Fig. 9: Gadget simulating an execution of the reset net. 



Let M = (P, T,l,0, r) be a reset net such that 

- P = {pi,p2, ■ ■ ■ ,P\p\} is the set of places; 

- T = {ti,t2,---,t\r\} is the set of transitions; 

- I: T — > N' p l is the input function, such that for each transition t £ T, 1(f) is a |P| -dimension vector such that for all 
dimension p £ {1 , . . . ,\P\},I(t)(p) specifies the number of tokens from place p consumed by the transition t ; 1 1 

- O: T — > Nl p l is the output function, such that for each transition t £ T, O(f) is a |P|-dimension vector such that 
for all dimension p £ { 1 , . . . , |P| }, 0(t)(p) specifies the number of tokens produced in place p by the transition t; 

- r: T — > P is the reset function, such that for all transition t £ T, r(t) specifies the unique place (w.l.o.g.) which is 
reset by transition t . 

Given an initial marking of the places (i.e., an initial number of tokens in each place) mo £ N^l, the termination 
problem asks if there exists an infinite execution of the net, that is, if there exists an infinite sequence of transitions 
that can be fired from Thq. A transition t is fireable from marking m £ if for all place p £ P, 1(f) (p) < m(p). 
An execution terminates if no transition can be fired because the necessary tokens are unavailable. We first note an 
important monotonicity property of reset nets: for all reset net N — (P,T,I,0,r), for all markings m,n £ N^l, if m < n 
and p £ T <0 is an infinite sequence of transitions fireable from m, then p is also fireable from n. This property is used 
later on. 

We claim that given a reset net J\f and an initial marking Tmq, we can build in polynomial time a multi-weighted 
game G in which V\ has a winning strategy for objective Bnd WM Pq (0) if and only if there exists no infinite execution 
of the net J\f from mo. 

We build the game G = (Si,S2,E,k,w) with k = \P\ +3 as represented in Fig. 8 and Fig. 9. Unlabeled edges have 
value zero in all dimensions. For clarity, we define the following |P|-dimension integer vectors: 1 = (1, . . . , 1) is the unit 
vector, = (0, . . . , 0) is the zero vector, and, for a, b £ Z, p £ P, vector a p ^ represents the vector (a,...,a,b,a,...,a) 
which has value b in dimension p and a in all the others. The first \P\ dimensions of the game are used to encode the 
tokens present in each place, whereas the last three are used to compel V\ to act fairly. Our construction will ensure 
that at all times along a valid execution of the net in a gadget, if a place p £P possess n tokens, then the running sum 
of weights over the largest open window has value (— n — 1) in dimension p. 

The states and edges of the game are built as follows. 

- Inside a gadget, we have a state fire belonging to 7^2, with |T| outgoing edges corresponding to the |T| transitions 
of the net. Each transition t is encoded as follows: 

• an edge from fire to a state test; belonging to V\, of value (1(f), -1,0,0), such that the running sum is updated 
to accurately encode the consumption of tokens; 

11 For simplicity, we use p to refer to a place p eP and to the number i £ {1,. . . , \P\} such that p,- = p, that is p indistinctly refers 
to the place and the corresponding dimension in the weight vectors. 
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• in state test ( , (|P| + 1) outgoing edges, giving V\ the possibility to either branch out of the gadget, going to 
the state close,, corresponding to the dimension p of his choice, or continuing via an edge of value (0,-1,0,0) 
to the reset,, state, a state of V\ such that q = r(t) is the unique place reset by transition t ; 

• a self-loop of value (0 9 _>i , — 1 , 0, 0) on the reset,, state; 

• an edge from reset,, to out r of value (0 9 _>._i, —1,0,0) which purpose is to ensure that in dimension q, there is 
a new open window of sum —1 after a full reset (i.e., it encodes that the number of tokens in place q is zero); 

• an edge from out ( back to fire of value (-O(f), — 1,0,0), producing tokens according to the output of transition 
t. 

- Branching from the left gadget leads to a state closejf 1 of V\ with a self-loop of weight (l p ->o, 1, 1, — 1) and an 
outgoing edge to state delay left of 7 , 2- 

- State delay left possess a self-loop of value (0,0, 1, 1) and an edge going to the right gadget with value (— mo — 
1,0,0,0). 

- The right gadget is constructed symmetrically, the only change being that the self-loop on states close" ght of V\ 
now has value (l p _>o, 1,-1,1). 

The game starts in the left gadget with an initial edge of value (— Thq — 1,0,0,0) corresponding to the initial marking 
of the net. 

We claim that (i) if there exists no infinite execution p G T a of the net TV, then V\ has a winning strategy in G 
for the bounded window objective, and (ii) if there exists such an execution, then V2 has a winning strategy in G. By 
determinacy, proving both claims will conclude our proof. 

Case (i). Assume that there exists no infinite execution p G T m of the net. Then there exists a bound fceNon the 
length of any valid execution. Hence, V2 can only simulate the net faithfully for b steps, so after at most (b + l) steps, 
he needs to use an unfireable transition. That is, the next chosen transition requires more tokens than available in some 
place p G P. We define a winning strategy X\ G A\ of V\ in G as follows: 

1. In a state test,, if the last transition t was valid (i.e., all first |P| dimensions have a negative running sum), go to 
the corresponding reset,, state. Otherwise, there exists a dimension p in which the sum has become non-negative 
and all windows are closed: exit the gadget and go to the corresponding state close,,. 

2. In a state reset,,, cycle until the sum in dimension q takes value 0, then go to state out r . 

3. In a state close,,, take the loop exactly f(b) times before going to state delay, where /: N — > N is a well-chosen 
function that we define below (hence f(b) is constant along the play). 

We claim that it is possible to define f(b) sufficiently large to ensure that this strategy is winning. Let M G N be the 
largest number of tokens produced as output of any transition of the net, on any place. We consider the value of the 
negative sum in any of the first ( |P| + 1 ) dimensions at the moment when V\ decides to exit the gadget according to the 
strategy X\. Notice that for any dimension p G {1, . . . , \P\}, this sum is bounded by x = (-rrio(p) — 1 —b-M). Hence, 
the number of loops taken on any visit of state reset,, is bounded by x. The sum in dimension (\P\ + 1) is thus bounded 
by (b ■ (4 +x) + 1), which we define as f(b). The last two dimensions are not modified inside a gadget. Now clearly, 
looping in state close,, for f(b) steps is sufficient to close all windows in all dimensions corresponding to places (recall 
that dimension p is closed by V2 cheating on place p), as well as in dimension + 1). However, this loop opens 
a window in one of the last two dimensions (the last for the left gadget, and the second to last for the right gadget). 
As the delay state of V2 has a positive effect in those dimensions, if V2 decides to delay the play for f(b) steps, all 
windows will be closed. If he does not delay, the play will proceed to the next gadget, in which V2 is also forced to 
cheat before (b+l) transitions. Hence after looping for f(b) steps in the corresponding close p state, the open window 
will close (and another will open in the other dimension which will in turn be closed after the next gadget). By keeping 
this behavior, V\ can thus enforce that any open window along the play will close in at most (4 • f(b) +4) steps. Thus 
the outcome is winning for the bounded window objective. 

Case (ii). Assume that there exists an infinite execution p G T a of the net. We define a winning strategy X2 G A2 
of V2 as follows. The strategy is played in rounds, with the initial round being round 1. 

1. Every time a gadget is entered, start playing in state fire according to the infinite execution p, that is, choose 
transitions in order to obtain the same trace. 
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2. When a state delay is visited during round n, take the self-loop n times then continue to state fire and start round 
n + 1. 

Notice that this strategy requires infinite memory. We claim that any consistent outcome of the game is winning for V2, 
that is, it does not belong to BndWMP G (0). First, V\ cannot stay forever in a gadget, thanks to dimension (|P| + 1): 
he has to branch at some point otherwise the play is lost. Second, if in state reset^, V\ decides to cycle for less than 
necessary for a full reset, the situation gets better for V2 by the monotonicity property of the reset net (as V2 gets 
to continue with more tokens than expected). Notice that V\ cannot accumulate positive values in the sum, as the 
next edge will restart a new window and all accumulation will be forgotten with regard to the objective. Third, if 
V\ branches and exits the gadget to go to some state close p , then all dimensions corresponding to places, including 
dimension p, have a running open window (dimension p has a strictly negative value since V2 does not cheat). Hence, 
no matter how long V\ chooses the self-loop, the window in dimension p will stay open (and V\ cannot stay here 
forever because of the last two dimensions). Fourth, when the play reaches a state delay with an open window in 
dimension p E {l,...,\P\}, the strategy X2 prescribes that V2 will loop for longer and longer periods of time, thus 
enforcing open windows of constantly growing length. As a consequence, any consistent outcome is such that the 
bounded window objective is not satisfied, which proves our point and further concludes our proof. □ 

Notice that Theorem 5 implies that V\ may need to use a non-primitive recursive window size to win a multi- 
dimension bounded window mean-payoff game, whereas a pseudo-polynomial bound exists in the one-player case 
(see Corollary 1). 

4.4 On direct objectives 

Through this paper, we have studied the prefix-independent versions of the objectives defined in Sec. 4.1. In this 
section, we briefly argue that similar complexity results are obtained for the direct variants (Table 2), by slight mod- 
ifications of the presented proofs. Notice that memory requirements however change, as it is now sufficient to force 
one sufficiently long (for the fixed problem) or never closing (for the bounded problem) window to make an outcome 
losing. 





one-dimension 


^-dimension 


complexity 


V\ mem. | V2 mem. 


complexity 


V\ mem. | V2 mem. 


direct fixed 
polynomial window 


P-c. 


mem. req. 
< linear(|S| ■/„„) 


PSPACE-h. 
EXP-easy 


exponential 


direct fixed 
arbitrary window 


P(|S|,V,/ m „) 


EXP-c. 


direct bounded 
window problem 


NPHcoNP 


mem-less linear 


NPR-h. 





Table 2: Complexities memory requirements for the direct objectives. Differences with the prefix-independent objec- 
tives are in bold. 

One-dimension direct fixed window problem. The polynomial algorithm in the size of the game and the size of 
the window is given by Lemma 4. For polynomial windows, we obtain P-hardness using the proof of Lemma 7 and 
window size Z max = 2 • \S\, as if V\ can win the reachability game, he has a strategy to do it in at most \S\ steps. Lemma 6 
extends to direct objectives, and provides linear upper bounds on memory with the same arguments. In particular, the 
provided examples of games require memory for both players when the direct fixed window objective is considered. 

One-dimension direct bounded window problem. We obtain a NP HcoNP algorithm for the direct bounded problem 
by simplifying BoundedProblem (Lemma 10) as follows: BoundedProblem(G) = S \ UnbOpenWindow(G). Indeed, 
as the objective is no longer prefix-independent, it is sufficient for V2 to force one window that never closes to make 
the play losing. Hence, the attractor of the set S\L in algorithm BoundedProblem cannot be declared winning for 
V\. While memoryless strategies still suffice for V\ (applying the arguments of Lemma 10), winning strategies for 
V2 do not need infinite memory anymore, but at most linear memory. Indeed, a winning strategy of V2 is the one 
described in the proof of Lemma 10, but without taking rounds into account (i.e., the play stays forever in round one). 
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To illustrate that memoryless strategies still do not suffice for V2, consider a variation of Fig. 5, with the initial state 
being 52- Clearly, V2 must first take the cycle to s\ then loop forever on S2 to ensure a never closing window. Corollary 
1 extends in the direct case and gives the same bound on the window size. Finally, the reduction of mean-payoff games 
developed in Lemma 14 carries over to the direct bounded window objective, as the game with shifted weights is such 
that the mean-payoff is strictly positive. In which case, the supremum total-payoff is infinite and Lemma 2 applies, 
implying the result. 

Multi-dimension direct fixed window problem. The following results extend to the direct case. 

- EXPTIME algorithm. Lemma 15 presents a reduction from fixed window games to exponentially larger co-Biichi 
games. It is easy to obtain a similar reduction from direct fixed window games by considering a safety objective 
for V\ (i.e., reachability for the set of bad states for Vi). This also implies an exponential-time algorithm. 

- EXPTIME-hardness of the arbitrary window problem for weights { — 1,0,1} and arbitrary dimensions. The reduc- 
tion of the membership problem for polynomial space alternating Turing machines immediately yields the result 
for the direct objective. Indeed, the strategies proposed in the proof stay winning for this objective. Note that ac- 
tually the strategy of V2 may be simpler, as he may cycle forever on s re start after branching to punish an unfaithful 
symbol disclosure by keeping a window indefinitely open. 

- EXPTIME-hardness of the arbitrary window problem for two dimensions and arbitrary weights. The reduction 
from countdown games established in Lemma 17 extends straightforwardly to direct objectives, and V2 can use a 
simpler winning strategy consisting in looping forever in its zero cycle. 

- PSPACE-hardness of the polynomial window problem. The reduction of generalized reachablity games also holds 
without modification for the direct fixed polynomial window objective. 

- Exponential memory bounds. Exponential upper bounds follow from the modified Lemma 15, using safety games. 
Lower bounds witnessed by Lemma 19 are also verified in the presented game as well as from the reduction of 
generalized reachablity games. 

Multi-dimension direct bounded window problem. Non-primitive recursive hardness (Theorem 5) extends to the 
direct objective with a simpler construction. Indeed, it is sufficient to consider the game using only the first (|P| + 1) 
dimensions, and consisting of only one gadget, with the branching out of the gadget now going to an absorbing state 
with a self-loop of weight l p ^o such that when V\ decides to branch, all windows get closed eventually, except in the 
dimension p of his choice, for which the window is only closed if V2 cheats and stays open forever otherwise. 

5 Conclusion 

We showed that the strong relation between mean-payoff and total-payoff objectives breaks in multi-weighted games 
and we proved the undecidability of the total-payoff threshold problem in such games. We introduced new quantitative 
objectives providing conservative approximations and studied their complexities. We also studied the memory required 
by associated winning strategies. Our results are summarized in Table 1. 

References 

1. W. Ackermann. Zum hilbertschen aufbau der reellen zahlen. Mathematische Annalen, 99(1): 1 18-133, 1928. 

2. L. Brim, J. Chaloupka, L. Doyen, R. Gentilini, and J.-F. Raskin. Faster algorithms for mean-payoff games. Formal Methods in 
System Design, 38(2):97-l 18, 2011. 

3. A.K. Chandra, D. Kozen, and L.J. Stockmeyer. Alternation. J. ACM, 28(1): 1 14-133, 1981. 

4. K. Chatterjee, L. Doyen, TA. Henzinger, and J.-F. Raskin. Generalized mean-payoff and energy games. In Proc. of FSTTCS, 
LIPIcs 8, pages 505-516. Schloss Dagstuhl - LZI, 2010. 

5. K. Chatterjee and M. Henzinger. An 0(n 2 ) time algorithm for alternating biichi games. In Proc. of SODA, pages 1386-1399. 
SIAM,2012. 

6. K. Chatterjee, M. Randour, and J.-F. Raskin. Strategy synthesis for multi-dimensional quantitative objectives. In Proc. of 
CONCUR, LNCS 7454, pages 115-131. Springer, 2012. 

7. C. Dufourd, A. Finkel, and R Schnoebelen. Reset nets between decidability and undecidability. In Proc. of I C ALP, LNCS 
1443, pages 103-115. Springer, 1998. 



28 



8. A. Ehrenfeucht and J. Mycielski. Positional strategies for mean payoff games. Int. Journal of Game Theory, 8(2): 109-1 13, 
1979. 

9. E.A. Emerson and C.S. Jutla. Tree automata, mu-calculus and determinacy. In Proc. ofFOCS, pages 368-377. IEEE Computer 
Society, 1991. 

10. N. Fijalkow and F. Horn. The surprizing complexity of generalized reachability games. CoRR, abs/1010.2420, 2010. 

11. J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer, 1997. 

12. T. Gawlitza and H. Seidl. Games through nested fixpoints. In Proc. ofCAV, LNCS 5643, pages 291-305. Springer, 2009. 

13. H. Gimbert and W. Zielonka. When can you play positionally? In Proc. of MFCS, LNCS 3153, pages 686-697. Springer, 
2004. 

14. V.A. Gurvich, A.V. Karzanov, and L.G Khachivan. Cyclic games and an algorithm to find minimax cycle means in directed 
graphs. USSR Computational Mathematics and Mathematical Physics, 28(5):85-91, 1988. 

15. H. Bjorklund and S. Vorobyov. A combinatorial strongly subexponential strategy improvement algorithm for mean payoff 
games. Discrete Applied Mathematics, 155:210-229, 2007. 

16. M. Jurdzihski. Deciding the winner in parity games is in UP n co-UP. Inf. Process. Lett., 68(3): 1 19-124, 1998. 

17. M. Jurdzinski, J. Sproston, and F. Laroussinie. Model checking probabilistic timed automata with one or two clocks. Logical 
Methods in Computer Science, 4(3), 2008. 

18. A.V. Karzanov and V.N. Lebedev. Cyclical games with prohibitions. Math. Program., 60:277-293, 1993. 

19. R. Lazic, T. Newcomb, J. Ouaknine, A.W. Roscoe, and J. Worrell. Nets with tokens which carry data. Fundam. Inform., 
88(3):25 1-274, 2008. 

20. Y.M. Lifshits and D.S. Pavlov. Potential theory for mean payoff games. Journal of Mathematical Sciences, 145(3):4967-4974, 
2007. 

21. T.M. Liggett and S.A. Lippman. Stochastic games with perfect information and time average payoff. Siam Review, 1 1(4):604- 
607, 1969. 

22. D.A.Martin. Borel determinacy. Annals of Mathematics, 102(2):363-371, 1975. 

23. M.L. Minsky. Recursive unsolvability of Post's problem of "tag" and other topics in theory of Turing machines. The Annals 
of Mathematics, 74(3):437^155, 1961. 

24. N.N. Pisaruk. Mean cost cyclical games. Mathematics of Operations Research, 24(4):817-828, 1999. 

25. P. Schnoebelen. Verifying lossy channel systems has nonprimitive recursive complexity. Inf. Process. Lett., 83(5):251-261, 
2002. 

26. Y. Velner, K. Chatterjee, L. Doyen, T.A. Henzinger, A. Rabinovich, and J.-F. Raskin. The complexity of multi-mean-payoff 
and multi-energy games. CoRR, abs/1209.3234, 2012. 

27. Y. Velner and A. Rabinovich. Church synthesis problem for noisy input. In Proc. of FOSSACS, LNCS 6604, pages 275-289. 
Springer, 201 1. 

28. U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158:343-359, 
1996. 



29 



